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(57) Abstract 

Double-stranded cDNA was synthesized from nucleic 
acid extracted from Norwalk virus purified from stool speci- 
mens of volunteers. Single-stranded RNA probes derived from 
the DNA clone after subcloning into an in vitro transcription 
vector were also used to show that the Norwalk virus contains 
an ssRNA genome of about 8 kb in size. The availability of a 
Norwalk-specific cDNA and the genome sequence information 
allow rapid cloning of the entire genome and establishment of 
sensitive diagnostic assays. Such assays can be based on detec- 
tion of Norwalk and Norwalk-related virus nucleic acids or 
Norwalk and Norwalk-related viral antigens using probes or 
primers and polyclonal or monoclonal antibodies to proteins 
expressed from the cDNA or to synthetic peptides made based 
on the knowledge of the genome sequence. Assays using pro- 
teins deduced from the Norwalk virus genome and produced in 
expression systems can measure antibody responses. Vaccines 
for Norwalk and related viruses are made from an expressed 
Norwalk virus protein. 
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Methods and Reagents To Detect and Characterize 
Norwalk and Related Viruses 

This application is a Continuation-in-Part of Applicant's Co- 
Pending U.S. Application Serial No. 07/443,492 filed November 8, 1989, 
5 U.S. Application Serial No. 07/515,993, now abandoned, filed April 27, 
1990, U.S. Application Serial No. 07/573,509 filed August 27, 1990, and U. 
S. Application Serial No. 07/696,454 filed May 6, 1991, all entitled 
"Methods and Reagents To Detect and Characterize Norwalk and Related 
Viruses." 

10 This invention is supported in part through grants or awards from 

the Food and Drug Administration and the National Institute of Health. 
The United States Government may have certain rights to this invention. 

Field of the Invention 
The present invention relates generally to synthesizing clones of 
15 Norwalk virus and calicivirus and to making probes to Norwalk and 
related viruses. It also relates to methods of detection and 
characterization of Norwalk and related viruses. 

Background of the Invention 
Norwalk virus is one of the most important viral pathogens causing 

20 acute gastroenteritis, the second most common illness in the United States 
(Dingle et al., Am. J. Hyg. 58:16-30 (1953); Kapikian and Chanock, 
"Norwalk group of viruses" in B.N. Fields' 2d ed. of Virology . Raven Press, 
New York, pp. 671-693 (1990)). Up to 42% of cases of adult viral 
gastroenteritis have been estimated to be caused by Norwalk or 

25 Norwalk-like viruses (Kaplan et al., Ann. Internal Med. 96(6):756-761 
(1982)). Both water and foodborne transmission of Norwalk virus has 
been documented, and particularly large epidemic outbreaks of illness 
have occurred following consumption of contaminated shellfish, including 
clams, cockles, and oysters (Murphy et al, Med. J. Aust 2:329-333 (1979); 
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Gunn et al, Am. J. Epidemiol 115:348-351 (1982); Wilson et al., Am. J. 
Public Health 72:72-74 (1982); Gill et al., Br. Med. J. 287:1532-1534 
(1983); DuPont New Engl. J. Med. 314:707-708 (1986); Morse et al., New 
Engl J. Med. 314:678-681 (1986); Sekine et al., Microbiol. Immunol 
5 33:207-217 (1989)). An increase in fish and shellfish-related food 
poisonings has recently been noted and attributed to increased recognition 
of these entities by clinicians as well as to increased consumption of 
seafood (Eastaugh and Shepherd, Arch. Intern. Med. 149:1735-1740 
(1989)). 

10 Norwalk virus was discovered in 1973. Until recently, knowledge 

about the virus has remained limited because it has failed to grow in cell 
cultures and no suitable animal models have been found for virus 
cultivation. Human stool samples obtained from outbreaks and from 
human volunteer studies, therefore, are the only source of the virus. Still, 

15 the concentration of the virus in stool is usually so low that virus 
detection with routine electron microscopy is not possible (Dolin et al., 
Proc. Soc. Exp. Med. and Biol 140:578-583 (1972); Kapikian et al., J. 
Virol. 10:1075-1081 (1972); Thornhill et al., J. Infect Dis. 132:28-34 
(1975)). Current methods of Norwalk virus detection include immune 

20 electron microscopy and other immunologic methods such as radio 
immunoassays (RIAs) or a biotin-avidin enzyme linked immunoabsorbent 
assays (ELISAs) which utilize acute and convalescent phase serum from 
humans. To date, no hyperimmune antiserum from animals has been 
successfully prepared due either to insufficient quantities or unusual 

25 properties of the viral antigen. Preliminary biophysical characterization 
of virions has indicated particles contain one polypeptide (Greenberg et 
al., J. Virol 37: 994-999 (1981)), but efforts to characterize the viral 
genome have failed. 

Viruses related to Norwalk virus include small round enteric 

30 viruses, such as viruses with typical calicivirus morphology and the 
astro viruses. The classification scheme for the human small enteric 
viruses shown in Table 1 here is an updated version of a scheme outlined 
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by Caul and Appleton in the Journal of Medical Virology, 9:257-265 
(1982). This system is referred to in Cubitt et al } J. Infectious Diseases, 
156:806-814 (1987); Table 1 of the article by Appleton entitled "Small 
round viruses: classification and role in food-borne infections", in the book 
5 Novel Diarrhoea Viruses . Ciba Foundation Symposium No. 128, pp. 108- 
125 (John Wiley & Sons, N.Y. (1987)); and Table 1 of the chapter entitled 
"Norwalk group of viruses" by Kapikian and Chanock from the book 
Virology (B.N.Fields, 2d ed., Raven Press (1990)). 

As shown in Table 1, human small round structured enteric viruses 

10 include calicivirus and astrovirus. The recent sequencing of Norwalk 
virus indicates that Norwalk virus is a calicivirus and has a genome 
organization like that of other caliciviruses. In addition to the human 
small round enteric viruses are a large number of non-human small round 
viruses which have been classified as astroviruses, caliciviruses, and small 

15 round structured viruses based upon their morphology. Examples of these 
viruses are the primate calicivirus isolated from the pygmy chimpanzee, 
described in the journal Science 221:79-81 (1983), a porcine enteric 
calicivirus, described in the Journal of Clinical Microbiology 12:105-111 
(1980), and bovine astroviruses described in Vet Pathol. 21:208-215 (1984). 

20 Individual calicivirus types will at times exhibit host specificity and tissue 
tropisms, but as an overall group they cause gastroenteritis, hepatitis, 
abortion, skin lesions, pneumonia, myocarditis, and encephalitis. The 
caliciviruses infecting humans fit in this context in that Norwalk-like 
viruses cause gastroenteritis, hepatitis E causes hepatitis, and San Miguel 

25 sea lion virus type 5 causes skin vesicles in humans as well as infections 
in seals, fish, pigs and cattle. (D. O. Matson "Calicivirus Infections' 1 in 
Textbook of Pediatric Infectious Disease, 3d ed., R. D. Feigin and J. D. 
Cherry, eds., W. B. Sanders, Philadelphia, (in press)). 
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Summary of the Invention 
It is therefore an object of the invention to detect and characterize 
the Norwalk and related virus genomes by synthesizing and cloning a 
cDNA library. 

5 It is an associated object of the invention to deduce amino acid 

sequences from Norwalk and related viral cDNA. 

Another object of the invention is to develop probes or primers to 
confirm the genetic relationship between the Norwalk virus and the 
Norwalk-related viruses. 
10 Still another object of the invention is to develop a method of 

preparing polyclonal and monoclonal antibodies to the Norwalk and 
related viruses. 

Yet still another object of the invention is to develop a method of 

making probes to detect Norwalk and related viruses. 
15 A further object of the invention is to use the cDNA or fragments 

or derivatives thereof in assays to detect Norwalk and related viruses in 

samples suspected of containing the viruses. 

A still further object of the invention is to express proteins to 

measure antibody responses. 
20 A nucleotide sequence of the genome sense strand of the Norwalk 

virus cDNA clone intended to accomplish the foregoing objects includes 

the nucleotide sequence shown in Table 2. Within the Norwalk nucleotide 

sequence are regions which encode proteins. The nucleotide sequence of 

the Norwalk virus genome, its fragments and derivatives are used to make 
25 diagnostic products, vaccines and antivirals. 

Other and still further objects, features and advantages of the 

present invention will be apparent from the following description of a 

presently preferred embodiment of the invention. 

Brief Description of the Figures 
30 Figure 1. EM picture of Norwalk and related viruses. Norwalk virus (A), 
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human Calicivirus (B), small round structured virus (C), and human 
astrovirus (D). The var is 0.1 jim. 

Figure 2a. Hybridization of stool samples with S2 P-labeled plasmid DNA 
for screening positive Norwalk cDNA clones. Nucleic acids from paired 
5 stools [before (b) and after (a) infection with Norwalk virus] from two 
volunteers (1 and 2) were dotted on Zetabind filters. Replicate strips were 
prepared and hybridized at 50°C and 65°C with each test clone (pUC-27, 
pUC-593, pUC-13 and pUCNV-953). One clone (pUCNV-953) which 
reacted only with stool samples after (but not before) Norwalk infection 
10 was considered as a potential positive clone and was chosen for further 
characterization. 

Figure 2b. Dot blot hybridization of clone ^P-labeled pUCNV-953 with 
another 3 sets of stool samples collected at different times after infection 
(B = before acute phase of illness; A = acute phase of illness; P = 
15 post-acute phase of illness) of 3 volunteers. The nucleic acids were dotted 
directly or after treatment with RNAse or with DNAse before dotting. 
Double-stranded homologous cDNA (pUCNV-953) was dotted after the 
same treatments as the stool samples. 

Figure 3. Dot blot hybridization of Norwalk viruses in a CsCl gradient 
20 with ssRNA probes made from pGEMNV-953. Aliquots of 50ul from each 
fraction in a CsCl gradient were dotted onto a Zetabind filter. Duplicates 
of filters were made and hybridized with the two ssRNA probes 
respectively. The two strands were subsequently called cRNA (positive 
hybridization with the viral nucleic acid) and vRNA (no hybridization with 
25 the viral nucleic acid, data not shown). The graph shows EM counts of 
Norwalk viruses from each fraction of the same CsCl gradient for the dot 
blot hybridization. Five squares from each grid were counted and the 
average of the number of viral particles per square was calculated. 
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Figure 4. The nucleotide sequence of the genome sense strand of the first 
Norwalk virus cDNA clone. The deduced amino acid sequence of a long 
open reading frame in this cDNA also is shown. 

Figure 5. Schematic diagram of Norwalk cDNA clones. pUCNV-953 was 
5 the first positive cone identified. Overlapping clones were determined by 
restriction enzyme analyses and partial sequencing of the clones. AAA 
indicates the poly(a) tail at the 3' end of the viral genome. 

Figure 6. Norwalk virus encodes an RNA-directed RNA polymerase 
sequence motif. The deduced amino acid sequence of a portion of Norwalk 

10 virus pUCNV-4095 (NV) is compared with consensus amino acid residues 
thought to encode putative RNA-directed RNA polymerases of hepatitis 
E virus (HEV), hepatitis C virus (HCV), hepatitis A virus (HAV), Japanese 
encephalitis virus (JE), poliovirus (polio), foot-and-mouth disease virus 
(FMD), encephalomyocarditis virus (EMC), Sindbis virus (SNBV), tobacco 

15 mosaic virus (TMV), alfalfa mosaic virus (AMV), brome mosaic virus 
(BMV), and cowpea mosaic virus (CpMV). Sequences for viruses other 
than NV are from Figure 3 of Reyes et aL . Science 247:1335-1339 (1990). 

Figure 7. Three pairs of initial primers used to amplify the Norwalk virus 
genome. RNA was extracted from a stool sample (sample 543-11) by the 

20 CTAB technique and amplified by RT-PCR. Lanes 1 and 5, 1-kb markers 
from Bethesda Research Laboratories (the markers that migrated as 1.6, 
1.0 and 0.5 kb are labeled); lane 2, PCR with Norwalk virus primers 8 and 
9; lane 3, PCR with Norwalk primers 16 and 17; lane 4, PCR with 
Norwalk primers 1 and 4. The amplified products were separated on the 

25 agarose gel and visualized with UV light after staining with ethidium 
bromide. The small product seen in lane 3 was made in variable amount 
in different experiments. The positions of the three primer pairs used in 
this study are given above the autoradiograph. The numbers below the 
map indicate the size (in base pairs) of the RT-PCR product. 



WO 94/05700 



PCT/US93/08447 



Figure 8. This schematic shows the organization of Norwalk genome 
given in Table 2. The features shown here are based on analyses of the 
nucleotide sequence of the Norwalk virus genome and the deduced amino 
acid sequence of proteins encoded in the genome. The genome contains 
5 7753 nucleotides including 111 A's at the 3'-end. Translation of the 
sequence predicts that the genome encodes three open reading frames 
(shown by the open boxes in the second line). The first open reading 
frame is predicted to start from an initiation codon at nucleotide 146 and 
it extends to nucleotide 5359 (excluding the termination codon). The 

10 second open reading frame is initiated at nucleotide 5346 and it extends 
to nucleotide 6935, and a third open reading frame exists between 
nucleotides 6938 and 7573. Based on comparisons of these predicted 
proteins with other proteins in the protein databank, the first open 
reading frame is a protein that is eventually cleaved to make at least three 

15 proteins. These three proteins include a picornavirus 2C-like protein, a 
3C-like protease and a 3D-like RNA-dependent RNA polymerase. The 
second open reading frame encodes the capsid protein, which contains 
sequence homology with the picornavirus VP3 protein. 

Figure 9. Nucleotide and amino acid sequence of human calicivirus 
20 Sapporo cDNAs. The 551 nucleotide known sequence of human calicivirus 
Sapporo (HuCV Sapporo) is presented in its entirety. Below the 
nucleotide sequence is the amino acid sequence for HuCV Sapporo. Above 
the HuCV Sapporo nucleotide sequence is the sequence of the cDNA from 
a Houston day care center outbreak (Day care). In the Day care sequence 
25 a "." indicates the nucleotide is identical to the HuCV Sapporo nucleotide 
at that site. Where a nucleotide difference occurred in the Day care 
sequence, a new letter is indicated at that position. "N" indicates 
uncertainty of the nucleotide at that site. Below the HuCV Sapporo 
amino acid sequence are arrows, indicating the extent of cDNAs at23s2m31 
30 and c-29_4-gel (which together contribute to the 551 nucleotides of the 
known sequence) and the new 36 primer (see Table 6). 
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Figure 10. Nucleotide homologies between calicivirus cDNAs and 
calicivirus strains with known sequences. All comparisons are in 
reference to the sequence of human calicivirus Sapporo. The length of the 
baseline indicates the known sequence region. The boxes indicate areas 
5 of nucleotide sequence homology between HuCV Sapporo and the 
indicated strain. The length of the box indicates the part of the indicated 
strain where homology exists and the height of the box indicates the 
strength of the homology, SD = standard deviation. SD 3 or greater is 
significant. The numbers under the Norwalk homology box indicate the 
10 region of the Norwalk virus genome where homology was observed. 

Figure 11. Strategy used to obtain nucleotide sequence of the Norwalk- 
related virus SRSV/KY/89 using primers from the Norwalk virus sequence. 
This figure shows a partial schematic of the Norwalk virus genome and 
the predicted ORF1 showing the location of the 3D-like polymerase region, 
15 the second ORF showing the location of the VP3-like domain and the start 
of ORF 3. On the bottom, the solid lines show regions of KY89 sequenced 
based on using primer sets (see numbers such as 36 and 35, etc) chosen 
from the sequence of the Norwalk virus genome. 

Figure 12. Comparison of the Norwalk virus nucleotide sequence with the 
20 Norwalk virus-related virus SRSV/KY/89 nucleotide sequence. Part of the 
nucleotide sequence of Norwalk-related virus SRSV/KY/89 was determined 
using primers from the Norwalk-virus (NV) genome. Primers from the 
NV genome used to obtain the sequence of this Norwalk-related virus are 
shown in Table 6. Some of these primers were modified based on the 
25 initial nucleotide sequence obtained from the SRSV/KY/89 to obtain the 
rest of the sequence of SRSV/KY/89. The primers shown here and in 
Table 6 are used by way of example only; other NV primers can be used. 

Figure 13. Comparison of deduced amino acid sequence of proteins of the 
Norwalk virus and the Norwalk-related virus SRSV/KY/89. The protein 
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sequence of SRSV/KY/89 was deduced from the nucleotide sequence shown 
in Figure 12. Figure 13a shows a comparison of the deduced amino acid 
5 sequence of ORF2, the capsid, of SRSV/KY/89 with the same region 

encoded in the Norwalk virus genome. Figure 13b shows a comparison of 
5 the deduced amino acid sequence of part of the polymerase protein of 
SRSV/KY/89 with that of Norwalk virus. Comparisons of similar 
sequences from other Norwalk-related viruses will permit discovery of 
conserved and divergent regions including antigenic regions. The 
information will rapidly permit choices of broadly reactive primers to 
10 detect all Norwalk-related viruses and specific primer sets to detect 
individual Norwalk-related viruses. Similarily, fragments and peptides 
with common amino acid sequences or specific amino acid sequences can 
be selected for development of diagnostics, vaccines and antivirals. 



Figure 14. Comparison of partial nucleotide sequences of Norwalk virus 
15 and six Norwalk-related viruses obtained using primers from the NV 
genome. Sequences from SRSV/CDC 6/91, SRSV/UT/88, SMA/78; 
SRSV/Cambridge, UK/92, SRSV/CDC 32, Norwalk virus/68, SRSV-3/88, 
SRSV/KY89/89. Figures 14a and 14b show two different regions of the 
genome. 



20 Figure 15. Expression of the Norwalk virus capsid protein. Baculovirus 
recombinants (C-6 and C-8) that contain a subgenomic piece of Norwalk 
virus DNA (from nucleotides 5337 to 7753) were selected and used to 
infect insect (Spodoptera fugiperda) cells at a multiplicity of infection of 
10 PFU/celL After 4 days of incubation at 27° C, the infected cells were 

25 harvested and the proteins were analyzed by electrophoresis on 12% 
polyacrylamide gels. The proteins were visualized after staining with 
Coomassie blue. The Norwalk-expressed protein (highlighted by the 
arrowhead) is only seen in the recombinant-infected cells, but not in wild- 
type baculovirus (wt) or mock-infected (m) insect cells. 
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Figure 16. The Norwalk virus expressed protein shows immunoreactivity 
with sera from volunteers infected with Norwalk virus. The expressed 
protein shown in Figure 11 was absorbed onto the wells of a 96-well 
ELISA plate and its reactivity was tested with dilutions of serum samples 
5 taken from volunteers before (pre) and three weeks after (post) infection 
with Norwalk virus. After an incubation at 37° C for 2 hours, a 
peroxidase-conjugated goat-anti-human IgG, IgM and IgA serum was 
added and reactivity was subsequently observed by reading the optical 
density at 414nm after addition of the substrate. The data show that 
10 post-infection sera reacted strongly with the expressed antigen at serum 
dilutions of 1:100 and 1:1000, and some sera were still specifically reactive 
at a dilution of 1:10,000. 

Figure 17. Baculovirus recombinants containing the 3'-end of the 
Norwalk genome produce virus-like particles in insect cells. Lysates from 
15 insect cells infected with baculovirus recombinant C-8 (see Figure 11) were 
analyzed by electron microscopy and shown to contain numerous virus-like 
particles. These particles are the same size as virus particles obtained 
from the stools of volunteers infected with Norwalk virus. Bar = 50 nm. 

Figure 18. Norwalk virus-like particles can be purified in gradients of 
20 CsCl. Supernatants of insect cells infected with the baculovirus 
recombinant C-8 were processed by extraction with genetron and PEG 
precipitation and virus eluted from these PEG pellets was centrifuged in 
CsCl gradient in a SW50.1 rotor for 24 hours at 4°C. The gradient was 
fractionated and material in each fraction was adsorbed onto two wells of 
25 an ELISA plate. Duplicate wells were then treated either with pre- or 
post-infection serum, peroxidase-conjugated goat anti-human serum and 
substrate and the reactions were monitored by reading the OD414nm. A 
peak was observed in the gradient at a density of 1.31 g/cm 3 and this peak 
was shown to contain virus-like particles by electron microscopy. This 
30 peak also contained a major protein of an approximate molecular weight 
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of 58,500 that co-migrated with the protein expressed in the insect cells 
from the same baculovirus recombinant. 

Figure 19. Use of the expressed virus-like particles to measure the 
reactivity of pre- and post-serum samples from volunteers infected with 
5 Norwalk virus shows that most volunteers have an immune response. 
Volunteer 6 who did not show an immune response also did not become 
ill after being administered virus. 

Figure 20. Partial sequence of the primate Pan paniscus cDNA atprcvw2. 

Detailed Description of the Invention 

10 It is readily apparent to one skilled in the art that various 

substitutions and modifications may be made to the invention disclosed 
herein without departing from the scope and spirit of the invention. 

The term "fragment* 1 as used herein is defined as any portion of the 
Norwalk virus genome or a subgenomic clone of the Norwalk virus that 

15 is required to be expressed to produce or encodes a peptide which in turn 
is able to induce a polyclonal or monoclonal antibody. It is possible a 
peptide of only 5 amino acids could be immunogenic but usually peptides 
of 15 amino acids or longer are required. This depends on the properties 
of the peptide and it cannot be predicted in advance. 

20 The term "derivative" as used herein is defined as larger pieces of 

DNA or an additional cDNA which represents the Norwalk virus genome 
and which is detected by direct or sequential use of the original cDNA and 
any deduced amino acid sequences thereof. Clone pUCNV-1011, therefore, 
is a derivative, although it does not overlap or share sequences with the 

25 original clone. Also included within the definition of derivative are RNA 
counterparts of DNA fragments and DNA or cDNA fragments in which 
one or more bases have been substituted or to which labels and end 
structures have been added without affecting the reading or expression of 
the DNA or cDNA. 
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The terms Norwalk "related viruses" and "Norwalk-like viruses" as 
used herein are defined as human and non-human calicivirus, astrovirus 
and small round structured viruses (SRSV). As the genomic sequences of > 
most of these viruses are not known, this classification is based on 
5 morphology as described by Caul and Appleton in the Journal of Medical 
Virology, 9:257-265 (1982); by Appleton in the article entitled "Small 
round viruses: classification and role in food-borne infections", in the book 
Novel Diarrhoea Viruses . Ciba Foundation Symposium No. 128, pp. 108- 
125 (John Wiley & Sons, N.Y. (1987)); and by Kapikian and Chanock in 
10 the chapter entitled "Norwalk group of viruses" from the book Virology 
(B.N.Fields, 2d ed., Raven Press (1990)). As the genomic sequences of the 
viruses become known, those skilled in the art will be able to determine 
Norwalk-related viruses and Norwalk-like viruses based on nucleotide 
homologies. 

15 Within the Norwalk-related viruses is a subgroup of viruses referred 

to herein as the SRSV's or the Norwalk group. The Norwalk group 
includes Snow Mountain Agent (SMA), Hawaii Agent, Taunton Agent, 
Amulree, Otofuke, and Montgomery County Agent. The Norwalk group 
is characterized by small, round, structured viruses with an amorphous 

20 surface or ragged outline. 

Production of Norwalk Virus for Molecular Cloning 

Norwalk virus was produced by administration of safety tested 
Norwalk virus (8FIIa) to adult volunteers. The virus inoculum used in the 
volunteer study, was kindly supplied by Dr. Albert Kapikian (Laboratory 

25 of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 
MD). This virus originated from an outbreak of acute gastroenteritis in 
Norwalk, Ohio (Dolin et al., 1971). Two ml of a 1 to 100 dilution of 8FIIa 
in TBS was administered orally to each individual with 80 ml of milli-Q 
water (Millipore, Bedford, MA 01730). Sodium bicarbonate solution was 

30 taken by each person 2 minutes before and 5 minutes after virus 
administration. The volunteer studies were approved by the Institutional 



WO 94/05700 



PCT/US93/08447 



13 

Review Board for Human Research at Baylor College of Medicine, at the 
Methodist Hospital and at the General Clinical Research Center. The 
virus was administered to the volunteers in the General Clinical Research 
Center where the volunteers were hospitalized and under extensive 
5 medical care for 4 days. All stools were collected and kept at -70°C for 
later use. 

Purification of Norwalk Viruses from Stool Samples 

A 10% solution of stool samples in TBS was clarified by low speed 
centrifugation at 3000 rpm for 15 minutes. The resulting supernate then 

10 was extracted two to three times with genetron in the presence of 0.5% 
Zwittergent 3-14 detergent (Calbiochem Corp., La Jolla, CA). Viruses in 
the aqueous phase were concentrated by pelleting at 36,000 rpm for 90 
minutes through a 40% sucrose cushion in a 50.2 Ti rotor (Beckman 
Instruments, Inc., Palo Alto, CA 94304). The pellets were suspended in 

15 TBS and mixed with CsCl solution (refractive index 1.368) and centrifuged 
at about 35,000 rpm for about 24 hours in a SW50.1 rotor (Beckman). 
The CsCl gradient was fractionated by bottom puncture and each fraction 
was monitored for virus by EM examination. The peak fractions 
containing Norwalk virus were pooled and CsCl in the samples was 

20 diluted with TBS and removed by pelleting the viruses at about 35,000 
rpm for 1 hour. The purified virus was stored at about -70°C. 

Extraction of Nucleic Acids from Purified Virus 

One method of extraction involved treating purified Norwalk virus 
from CsCl gradients with proteinase K (400 ug/ml) in proteinase K buffer 

25 (0.1 M Tris-Cl pH 7.5, 12.5 mM EDTA, 0.15 M NaCl, 1% w/v SDS) at 
about 37°C for about 30 minutes. The samples were then extracted once 
with phenol-chloroform and once with chloroform. Nucleic acids in the 
aqueous phase were concentrated by precipitation with 2.5 volumes of 
ethanol in the presence of 0.2 M NaOAc followed by pelleting for 15 

30 minutes in a microcentrifuge. 
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cDNA Synthesis and Cloning of Amplified of cDNA 

One method of synthesis and cloning included denaturing nucleic 
acids extracted from the purified Norwalk viruses with 10 mM CH 3 HgOH. 
Then cDNA was synthesized using the cDNA synthesis kit with the 
5 supplied random hexanucleotide primer (Amersham, Arlington Heights, 
IL 60005). After the second strand synthesis, the reaction mixture was 
extracted once with phenol-chloroform and once with chloroform followed 
by ethanol precipitation. Amplification of DNA was performed using the 
random prime kit for DNA labeling (Promega Corp., Madison, WI 
10 53711-5305). Eight cycles of denaturation (100°C for 2 minutes), 
reannealing (2 minutes cooling to room temperature) and elongation 
(room temperature for 30 minutes) were performed after addition of 
Klenow fragment (Promega Corp.). A DNA library was constructed in 
pUC-13 with blunt-end ligation into the Sma I site. 

15 Screening of the Library for Positive Clones 

As one method of screening, white colonies from transformed DH5 
alpha bacterial cells (BRL) were picked and both a master plate and 
minipreps of plasmid DNA were prepared for each clone. Clones 
containing inserts were identified after electrophoresis of the plasmid 

20 DNA in an agarose gel. The insert DNA in the agarose gel was cut out 
and labeled with 32 P using random primers and Klenow DNA polymerase 
such as in the PRIME-A-GENE® labeling system (Promega Corp.). Other 
isotopic or biochemical labels, such as enzymes, and fluorescent, 
chemiluminescent or bioluminescent substrates can also be used. Nucleic 

25 acids extracted from paired stool samples (before and after Norwalk 
infection) from two volunteers (543 and 544) were dotted onto Zetabind 
filters (AFM, Cuno, Meriden, CT). Replicate filter strips were prepared 
and hybridized with each labeled plasmid probe individually at 65°C 
without formamide. Potential positive clones were judged by their 

30 different reactions with the pre- and post-infection stools. Clones which 
reacted with post (but not pre-) infection stools of volunteers were 
considered positive and these clones on the master plates were 
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characterized further. Once one Norwalk clone was identified, it was used 
to rescreen the cDNA library to identify additional overlapping clones. 
Rescreening the cDNA library with these additional clones can ultimately 
identify clones representing the entire Norwalk virus genome. 

5 Reverse Transcriptase-Polymerase Chain Reaction Production of cDNA 
Clones from Viruses Related to Norwalk Virus 

One method for producing cDNA clones of viruses related to 
Norwalk virus using the knowledge of the Norwalk virus genome sequence 
is the reverse transcription-polymerase chain reaction method. In this 

10 procedure, RNA was extracted from 300 uL of specimen containing the 
related virus. Complementary DNA was prepared by reverse 
transcriptase-polymerase chain reaction (RT-PCR) using a primer pair (for 
example primers 36 and 35 shown in Table 6) derived from the sequence 
of Norwalk virus. The resulting product was ligated into a plasmid vector 

15 and transfected into E. colt Plasmids then were partially purified from 
the bacteria and the inserted PCR product was sequenced in the plasmid 
by dideoxy chain termination to examine the relation to Norwalk virus by 
nucleotide and predicted protein homology. 

The following examples are offered by way of illustration and are not 

20 intended to limit the invention in any manner. 

Example 1 
Electron micrograph confirmation 
To permit better diagnosis and molecular characterization of 
Norwalk virus and related viruses, a cDNA library for Norwalk was 
25 derived from nucleic acid extracted from virions purified from stool 
samples. Norwalk virus was purified with methods used previously for 
hepatitis A and rotaviruses from stool samples with some modifications 
(Jiang et al., 1986). Basically, stool samples obtained from volunteers 
administered Norwalk virus were treated with genetron to remove lipid 
30 and water insoluble materials. Virus in the aqueous phase was then 
pelleted through a 40% sucrose cushion. The resulting pellets were 
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resuspended, sonicated and loaded in a CsCl gradient for isopycnic 
centrifugation. 

Figure 1 shows an electron micrographs of purified Norwalk viruses 
isolated by the above procedure and Norwalk-related viruses used to 
5 produce cDNAs using RT-PCR. 

Example 2 

Initial cDNA synthesis, cloning and screening 
A cDNA library was generated from nucleic acids extracted from 
these purified viruses by proteinase K treatment of the samples followed 

10 by phenol-chloroform extraction and ethanol precipitation (Jiang et aL, 
1986; 1987). Because the nature of the viral genome was unknown, the 
extracted nucleic acids were denatured with methylmercuric hydroxide 
before cDNA synthesis. Random primed cDNA was synthesized with the 
Gubler-Hoffman method (cDNA synthesis system plus, Amersham) and a 

15 small amount of cDNA was obtained. Direct cloning of this small amount 
of cDNA was unsuccessful. Therefore, a step of amplification of the DNA 
was performed by synthesizing more copies of the DNA with random 
primers and the Klenow fragment of DNA polymerase before cloning. The 
procedure involved cycles of denaturation, addition of random primers and 

20 the Klenow fragment of DNA polymerase, reannealing and elongation. 
With this procedure, a linear incorporation of labeled nucleotides into 
product was observed as the number of cycles of synthesis was increased. 
The number of cycles performed was limited (<10) to avoid the synthesis 
of an excess of smaller fragments. In the case of Norwalk cDNA, eight 

25 cycles of amplification were performed and approximately 2.5 ug of DNA 
were obtained, which was at least a 100-fold amplification of the starting 
template cDNA This amplified cDNA was cloned into pUC-13 by 
blunt-end ligation and a positive clone (pUCNV-953) was isolated. 

To obtain the positive Norwalk virus clone, minipreparations of the 

30 plasmid DNAs containing potential inserts were screened by agarose gel 
electrophoresis. Inserts of the larger clones in the gel were cut out and 
probes were made with the DNA in the gel using the PRIME-A-GENE® 
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labeling system (Promega Corp.). These probes were hybridized 
individually with paired stool samples (before and after Norwalk infection) 
from two volunteers (Figure 2a). One clone (pUCNV-953) reacted with 
post- but not pre-infection stool samples from both volunteers. 

5 Example 3 

Confirmation of viral origin of the clone pUCNV-953 
To further confirm the viral origin of the clone pUCNV-953, six 
more paired stool samples were tested and the same results were obtained. 
Figure 2b shows a dot blot hybridization of the clone with stool samples 

10 collected at different times post-infection of the disease. Strong signals 
were observed only with stools from acute phase, but not before and after 
the illness. This result was consistent with previous RIA assays for viral 
antigen detection using convalescent sera from volunteers with Norwalk 
diarrhea and immune electron microscopy (IEM) studies of the samples 

15 for viral particle examination. This result also agrees with the patterns 
of virus shedding in stool in the course of the disease (Thornhill et al., 
1975 ). When the pUCNV-953 clone was hybridized with fractions of a 
CsCl gradient from the Norwalk virus purification scheme, an excellent 
correlation between hybridization and EM viral particle counts was 

20 observed (Figure 3). The peaks of the hybridization signals and viral 
particle counts both were at fractions with a density of 1,38 g/cm 3 , which 
agrees with previous reports of the biophysical properties of Norwalk 
virus. Finally, the clone was tested by hybridization with highly purified 
Norwalk virus electrophoresed on an agarose gel. A single hybridization 

25 band was observed with Norwalk virus but not with HAV and rotavirus. 
Sequence analysis of the pUCNV-953 cDNA showed this clone is 511 bp 
(Figure 4). This partial genomic cDNA encodes a potential open reading 
frame for which the amino acid sequence has been deduced (Figure 4). No 
significant nucleotide or deduced amino acid sequence homology was 

30 found by comparison with other sequences in the Gen Bank (Molecular 
Biology Information Resource, Eugene Software, Baylor College of 
Medicine). 
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Example 4 

Use of Norwalk virus cDNA to characterize the viral genome 
The pUCNV-953 cDNA was subcloned into the transcription vector 
pGEM-3Zf(+) and grown. ssRNA probes were then generated by in vitro 
5 transcription using SP6 and T7 polymerases (Bethesda Research 
Laboratory). When two opposite sense ssRNA probes were hybridized 
with the viral nucleic acid separately, only one strand reacted with the 
virus, indicating the viral genome is single-stranded. As shown in Figure 
2b, the hybridization signals were removed by treatment of the viral 

10 nucleic acid with RNAse (but not with DNAse) before loading them onto 
the filters, indicating the virus genome contains ssRNA. A long open 
reading frame was found in one of the two strands of the inserted DNA 
by the computer analysis of the sequences of pUCNV-953. The ssRNA 
probe with the same sequence as this coding strand does not react with 

15 the viral nucleic acid, but the complementary ssRNA probe does react in 
the hybridization tests. Therefore, Norwalk virus contains a positive 
sense single-stranded RNA genome. The size of the genome of Norwalk 
virus was estimated to be about 8 kb based on comparisons of the 
migration rate of the purified viral RNA in agarose gels with molecular 

20 weight markers. 

The pUCNV-953 cDNA was used to rescreen a second cDNA library 
made as follows. A clone of the Norwalk or related virus was synthesized 
by isolating nucleic acid from purified Norwalk virus; cDNA was 
synthesized using reverse transcriptase and random primers; a second 

25 strand of DNA was synthesized from the cDNA; and at least one copy of 
DNA was inserted into a plasmid or a cloning and expression vector; and 
screening the library with the original puCNV-953 cDNA identified clones 
containing fragments of (or the complete) Norwalk or related genome. 
Alternatively at least one copy of DNA was inserted in a cloning and 

30 expression vector, such as lambda ZAPII® (Stratagene Inc.), and the cDNA 
library was screened to identify recombinant phage containing fragments 
of or the complete Norwalk or related genome. Additional cDNAs were 
made and found with this method. Use of these additional cDNAs to 
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made and found with this method. Use of these additional cDNAs to 
rescreen the library resulted in detection of new clones (Figure 5). 

Thus, those skilled in the art will recognize that entire Norwalk 
virus cDNA sequence, or fragments or derivatives thereof, can be used in 
5 assays to detect the genome of Norwalk and other related viruses. The 
detection assays include labeled cDNA or ssRNA probes for direct 
detection of the Norwalk virus genome and measurement of the amount 
of probe binding. Alternatively, primers or small oligonucleotide probes 
(10 nucleotides or greater) and polymerase chain reaction amplification 

10 are used to detect the Norwalk and Norwalk-related virus genomes. 
Expression of the open reading frame in the cDNA is used to make 
hyperimmune or monoclonal antibodies for use in diagnostic products, 
vaccines and antivirals. 

Using the above methodology, the nucleotide sequence in Table 2 

15 was identified. Within that nucleotide sequence, the encoding regions for 
several proteins have been identified. In that sequence, the first protein 
is encoded by nucleotides 146 through 5339 and the amino acid sequence 
is shown in Table 3. This first protein is eventually cleaved to make at 
least three proteins including a picornavirus 2C-like protein, a 3C-like 

20 protease and an RNA-dependent RNA polymerase. The RNA-dependent 
RNA polymerase is deduced from nucleotides 4543 to 4924 of the Norwalk 
virus genome as shown in Table 3. The fact that this portion of the 
genome contains an RNA polymerase is verified by comparisons with RNA 
polymerase in other positive sense RNA viruses (Figure 6 SEQ ID NOS 

25 38 through 50). 

Also in the sequence in Table 2, two other protein encoding regions 
were found. They are encoded by nucleotides 5346 through 6935 and 
nucleotides 6938 through 7573. The amino acid sequences for these two 
proteins are shown in Tables 4 and 5, respectively. 



WO 94/05700 



PCT/US93/08447 



20 

Example 5 

Diagnostic assays based on detection of the 
sequences of the Norwalk virus genome 
Hybridization assays are the assays of choice to detect Norwalk virus 
5 because small amounts of virus are present in clinical or contaminated 
water and food specimens. Previously, detection of Norwalk and related 
nucleic acids was not possible because the genome of Norwalk virus was 
not known and no sequence information was available. Probes made from 
the Norwalk virus cDNA or primers made from the Norwalk virus genome 

10 sequence allow methods to amplify the genome for diagnostic products to 
be established. Probes to identify Norwalk virus alone and to identify 
other Norwalk-related viruses enable development of either specific assays 
for Norwalk or general assays to detect sequences common to many or all 
of the Norwalk-related agents. 

15 In the past, one major difficulty encountered in RT-PCR detection 

of viral RNA in stool samples was that uncharacterized factor(s) are 
present in stools which inhibit the enzymatic activity of both reverse 
transcriptase and Taq polymerase (Wilde et al, J. Clin. Microbiol. 
28:1300-1307, 1990). These factor(s) were difficult to remove by routine 

20 methods of nucleic acid extraction. Techniques were developed using 
cetyltrimethylammonium bromide (CTAB) and oligo d(T) cellulose 
specifically to separate viral RNA from the inhibitory factor (s). These 
techniques were based on the unique properties of CTAB which selectively 
precipitates nucleic acid while leaving acid insoluble polysaccharide in the 

25 supernatant. The resulting nucleic acid was further purified by adsorption 
onto and elution from oligo d(T) cellulose. This step removes unrelated 
nucleic acids that lack a poly(A) tail. With this technique, Norwalk virus 
was detected easily by PCR in very small amounts (400 ul of a 10% 
suspension) of stool sample. For example, one skilled in the art will 

30 recognize that it is now possible to clone the genome of RNA viruses 
present in low concentrations in small amounts of stool after RT-PCR and 
a step of amplification of the viral RNA by RT-PCR using random 
primers. In some cases, RT-PCR active nucleic acids are extracted with 
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CTAB and without oligo d(T) cellulose. In addition, now that the 
inhibitor(s) can be removed from stool, it is also possible to detect and 
clone nucleic acids of other viruses (DNA viruses, non-poly(A) tailed RNA 
viruses) present in stool. 
5 The CTAB and oligo d(T) cellulose technique of extraction followed 

by detection of viral RNA with RT-PCR was used on stool samples and 
could be used on water and food samples. Stool sample was suspended in 
distilled water (about 10% wt/vol) and extracted once with genetron. 
Viruses in the supernatant were precipitated with polyethylene glycol at 

10 a final concentration of about 8%. The viral pellets were treated with 
proteinase K (about 400 ug/ml) in the presence of SDS at about 37°C for 
about 30 minutes followed by one extraction with phenol chloroform and 
one with chloroform. A solution of about 5% CTAB and about 0.4M NaCl 
was added at a ratio of sample :CTAB equal to about 5:2. After incubation 

15 at about room temperature for about 15 minutes and at about 45<>C for 
about 5 minutes, the nucleic acids (including the viral RNA) were collected 
by centrifugation in a microcentrifuge for about 30 minutes. The 
resultant pellets were suspended in about 1M NaCl and extracted twice 
with chloroform. The viral RNA in the aqueous phase was used directly 

20 in RT-PCR reactions or further purified by adsorption/elution on oligo 
d(T) cellulose. 

A batch method of adsorption/elution on oligo d(T) cellulose was 
used to purify poly(A) tailed RNA. In this procedure, nucleic acids 
partially purified as described above or RNA extracted directly with 

25 phenol chloroform (without CTAB treatment) were mixed with oligo d(T) 
cellulose (about 2-4mg/sample) in a binding buffer (about 0.5M NaCl and 
lOmM Tris, pH 7.5). The mixture was incubated at about 4©C for about 
1 hr with gentle shaking and then centrifuged for about 2 minutes in a 
microcentrifuge. The oligo d(T) cellulose pellet was washed 3-4 times with 

30 binding buffer and then the poly(A) tailed RNA was eluted with IX TE 
buffer (about lOmM Tris, ImM EDTA, pH 7.5). The supernate was 
collected following centrifugation to remove the oligo d(T) cellulose and 
the viral RNA in the supernate was precipitated with ethanol. The RNA 
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obtained at this stage was basically inhibitor-free and able to be used in 
RT-PCR. 

In preliminary experiments, Norwalk virus RNA was detected in less 
than 0.05g of stool samples using the CTAB technique. A trace inhibitor 
5 activity was observed with RNA extracted with either CTAB or oligo d(T) 
alone, but this was easily removed by dilution (1:2) of the extracted 
nucleic acid before RT-PCR. Combination of the CTAB and oligo d(T) 
techniques resulted in obtaining high quality, inhibitor free RNA which 
could be used directly for RT-PCR detection and for cloning of the viral 
10 genome. With development of this method to clone from small amounts 
of stool, one skilled in the art will know that they can obtain cDNAs for 
the remainder of the genome including those representing the 5'-end of 
the genome. 

For detection with PCR, primers based on the above nucleotide 

15 sequence of the genome were made by chemical methods. These primers 
include: Primer 1: CACGCGGAGGCTCTCAAT located at nucleotides 
7448 to 7465; Primer 4: GGTGGCGAAGCGGCCCTC located at 
nucleotides 7010 to 7027; Primer 8: TCAGCAGTTATAGATATG located 
at nucleotides 1409 to 1426; Primer 9: ATGCTATATACATAGGTC 

20 located at nucleotides 612 to 629; Primer 16: CAACAGGTACTACGTGAC 
located at nucleotides 4010 to 4027; and Primer 17: 
TGTGGCCCAAGATTTGCT located at nucleotides 4654 to 4671 (SEQ ID 
NOS 51 through 56, respectively). These primers have been shown to be 
useful to detect virus using reverse transcription and polymerase chain 

25 reaction methods (RT-PCR). Figure 7 shows data using these primers. 
In primer sets 1 and 4, 8 and 9, and 16 and 17, the reverse compliments 
for the sequences given above for primers 1, 8, and 17 were used. 

New, additional primer sets (Table 6 and SEQ ID NOS.: 15 to 37) 
are used as probes to detect the Norwalk-related viruses. Table 7 shows 

30 the ability of newly selected primer sets 36-35, 69-39, 78-80 to detect many 
Norwalk-related viruses. These results are additional examples of the use 
of primer sets from the original Norwalk virus sequence to detect 
Norwalk-related viruses. Nucleotide sequence data of many of these 
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viruses indicates that there is a continuum of genetic relatedness within 
the RNA region described by primer sets 36-35 or 69-39 of these different 
viruses (from 87% to 0%), yet these different agents can be detected using 
primers from the Norwalk virus genome sequence. The sequence of 2516 
5 nucleotides of another small round structured virus (SRSV/KY/89 SEQ ID 
NO: 12) also was obtained by using a total of 8 additional sets of primers 
from the original Norwalk virus sequence (primers 56 and 23, 42 and 55, 
58 and 59, 60 and 61, 72 and 63, 76 and 77, 64 and 75, and 74 and 3; 
Table 6). 

10 Example 6 

Preparation of polyclonal antibodies 
and monoclonal antibodies to Norwalk virus proteins 
Protein(s) encoded in the cDNA fragments or derivatives thereof, is 
produced in a prokaryotic or eukaryotic expression system and used to 

15 immunize animals to produce polyclonal antibodies for diagnostic assays. 
Prokaryotic hosts may include Gram negative as well as Gram positive 
bacteria, such as E. coli . S. tvmphimurium . Serratia marcescens . and 
Bacillus subtilis . Eukaryotic hosts may include yeast, insect or 
mammalian cells. Immunized animals may include mammals such as 

20 guinea pigs, mice, rabbits, cows, goats or horses or other non-mammalian 
or non-murine species such as chickens. Repeated immunization of these 
animals with the expressed protein mixed with an adjuvant such as 
Freund adjuvant to enhance stimulation of an immune response produces 
antibodies to the protein. 

25 Alternatively, synthetic peptides of greater than 15 amino acids 

made to match the amino acid sequence deduced from the partial cDNA 
sequence (or from other sequences determined by sequencing additional 
cDNAs detected with the original or other clones) are linked to a carrier 
protein such as bovine serum albumin or lysozyme or cross-linked with 

30 treatment with glutaraldehyde and used to immunize animals to produce 
polyclonal antibodies for diagnostic tests. 
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The serum of animals immunized with either the expressed protein 
or with synthetic peptides are tested by immunologic assays such as 
immune electron microscopy, Western blots (immunoblots) and blocking 
ELISAs to demonstrate that antibodies to Norwalk and related viruses 
5 have been made. Reactivities with the expressed protein or synthetic 
peptides show specificity of the polyclonal sera. Reactivities with other 
viruses in the Norwalk group (Snow Mountain Agent, Hawaii Agent, 
Taunton Agent, etc.) indicate production of a reagent which recognizes 
cross-reacting epitopes. 

10 Balb\c mice injected with the immunogens as described above and 

shown to have produced polyclonal antibodies are boosted with 
immunogen and then sacrificed. Their spleens are removed for fusion of 
splenocytes with myeloma cells to produce hybridomas. Hybridomas 
resulting from this fusion are screened for their reactivity with the 

15 expressed protein, the peptide and virus particles to select cells producing 
monoclonal antibodies to Norwalk virus. Screening of such hybridomas 
with Norwalk-related viruses permits identification of hybridomas 
secreting monoclonal antibodies to these viruses as well. 

Development of Diagnostic Assays 

20 Analysis of the deduced amino acid sequence of the Norwalk virus 

genome has shown that the Norwalk virus has the genetic organization 
shown in Figure 8. Expression of regions of this genome in cell-free 
translation systems and in the baculovirus expression system have shown 
that the 5'-end of the genome encodes nonstructural proteins and the 3'- 

25 end of the genome encodes at least one structural protein. Based on this 
information, one can express the complete genome or subgenomic regions 
of the genome to produce diagnostic assays to detect viral antigens or 
immune responses to specific regions of the genome. This information can 
be used to detect the Norwalk virus, antigens or immune responses to 

30 Norwalk virus. This information also can be used to detect other similar 
currently uncharacterized viruses that cause gastroenteritis or possibly 
other diseases. Some of these viruses will be in the Caliciviridae or in the 



WO 94/05700 



PCT/US93/08447 



25 

picornavirus superfamily. All of these viruses will have matching or 
similar genomic regions in their DNA sequences. 

The availability of cDNA clones from viruses related to Norwalk 
virus enables the production of new antibodies and antisera for diagnostic 
5 assays for these related viruses. For example, availability of cDNA clones 
from caliciviruses which cannot be cultivated permits the expression of 
protein products of those clones. The protein products are used to develop 
new antibodies and antisera. In addition, genetic engineering is used to 
combine the cDNAs from viruses related to Norwalk virus with the cDNAs 

10 from Norwalk virus to produce chimeric proteins, such that part of the 
protein produced is derived from Norwalk virus genome sequence and 
another part of the protein is derived from the genome sequence of a virus 
related to Norwalk virus. These chimeric proteins are then used to 
produce diagnostic reagents, vaccines and antivirals. Examples of the 

15 diagnostic assays are shown in the specific examples and figures below. 

Example 7 

Development of diagnostic assays to detect nucleic acids 
of Norwalk virus or Norwalk-related viruses bv detection 
of specific regions of the viral genomes 
20 based on an understanding of the Norwalk virus genome. 

The genetic organization of the Norwalk virus genome allows the 
prediction of specific regions of the gene sequence as regions where 
oligonucleotide primers or probes can be developed to detect Norwalk 
virus sequences and common sequences of other related or similar viruses. 
25 Some of these common genome sequences are found in viruses in the 
Caliciviridae or in the picornavirus superfamily. The detection can be 
done by standard PCR, hybridization or other gene amplification methods. 

Two primers, named 35 (CTT GTT GGT TTG AGG CCA TAT, 
complementary to nt 4944-4924 in the Norwalk virus genome, SEQ ID 
30 NO: 15) and 36 (ATA AAA GTT GGC ATG AAC A, nt 4475-4493 in the 
Norwalk virus genome, SEQ ID NO: 16), were chosen from the region 
likely to encode the Norwalk virus RNA polymerase. These primers then 



WO 94/05700 



PCT/US93/08447 



26 

were used to prepare a cDNA clone by reverse transcriptase-PCR from the 
nucleotide sequence of human calicivirus Sapporo strain (HuCV Sapporo), 
1982 outbreak (Figure 9, SEQ ID NO;5). The resulting sequence was 
compared to that of Norwalk virus and of feline and rabbit caliciviruses 
5 available from Genbank. The first cDNA clone from Sapporo, named 
"c-29__4-ger, determined to contain calicivirus sequence was 488 
nucleotides long, of which 40 nucleotides were contributed by primers 36 
and 35, leaving 448 nucleotides unique to human calicivirus Sapporo. The 
sequence of clone c-29_4-gel between primers 36 and 35 also is shown in 

10 Figure 9, SEQ ID NO:8. 

Evidence that the HuCV Sapporo cDNA clone was correct is shown 
by five facts. First, the sequence exhibits strong homology with Norwalk 
virus, feline calicivirus, and the rabbit calicivirus at the nucleotide and 
amino acid levels. (See Figure 10 and Tables 7 and 8). Second, the 

15 sequence contains a continuous protein encoding region on the positive 
strand. In Norwalk, feline, and rabbit caliciviruses continuous protein 
encoding regions also are found in the region of homology. Third, the 
sequence contains the amino acid motif YGDD, which is a marker for 
RNA virus proteins which have RNA-dependent-RNA-polymerase activity. 

20 In c-29_4-gel, the YGDD motif is at the predicted distance from the ends 
of the sequence. Fourth, the same cDNA product was obtained from six 
different stool specimens. Fifth, no significant homologies were found for 
other sequences in the Genbank. 

The nucleotide sequence of c-29_4-gel was used to synthesize an 

25 internal primer. This internal primer was used to prepare a second set of 
RT-PCR products from human calicivirus Sapporo RNA. A number of 
new cDNA clones were obtained of which one, named M at23s2m31 n , 
contains overlapping sequence which is 5' on the virus genome from that 
contained in c-29_4-gel. Sequence at23s2m31 is 149 nucleotides long 

30 (SEQ ID NO:7) and overlaps c-29_4-gel by 46 nucleotides. See Figure 9 
for at23s2m31 sequence and area of overlap with c-29_4-gel. The 
resulting combined sequence information of c-29_4-gel and at23s2m31 is 
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551 nucleotides in length, excluding the portion c-29_4-gel contributed by 
prime 35. 

Although the human calicivirus Sapporo sequence was generated 
from knowledge of the Norwalk virus sequence, the former is 
5 distinguishable in the same region (see Table 8 or Figure 9). The known 
sequence of human calicivirus Sapporo indicates that this virus is more 
closely related to the animal caliciviruses than to Norwalk virus. 

In May, 1987, a child in Houston was infected with a virus which 
was identified as a calicivirus based on its morphology. Samples 

10 containing virus particles from this child failed to react in serologic assays 
developed for the detection of Norwalk virus and human calicivirus 
Sapporo. Primers 36 and 35 were used to prepare cDNA from the viral 
genome of this strain using RT-PCR. The resulting cDNA product, called 
4847 complete, is 434 nucleotides long, excluding the primers, and is 

15 distinguishable from that of Norwalk virus and human calicivirus 
Sapporo. (See "Houston" in homology comparison in Figure 10; Table 10 
and SEQ ID NO:10). Evidence that this Houston cDNA is correct is the 
same as that listed for c-29_4-gel above, except that homology with 
Norwalk virus and human calicivirus Sapporo is not statistically 

20 significant. 

Use of the sequence from the human calicivirus 
Sapporo strain to produce an amplification primers 
for human calicivirus Sapporo and related agents 
The known sequence of human calicivirus Sapporo overlaps one of 
25 the two primers, called primer 36 (see Table 6), used for the initial 
amplification of cDNA clone c-29_4-gel. Examination of the homology of 
known calicivirus sequences (Table 8 SEQ ID NOS 57 through 62) in that 
region indicated that a new 36 primer could be synthesized and used to 
amplify caliciviruses more closely related to human calicivirus Sapporo 
30 than Norwalk virus. A new primer was synthesized and is called primer 
"new 36" (see Table 6, last line, and SEQ ID NO:37). 
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The new 36 primer was used with primer 35 to generate a cDNA 
clone from a calicivirus which caused a diarrhea outbreak in November, 
1986, in a Houston day care center ("Day care"). The calicivirus strain 
causing this Day care outbreak was antigenically related to human 
5 calicivirus Sapporo but antigenically distinct from Norwalk virus by EIA. 
The Day care cDNA product obtained from the RT-PCR reaction with 
primers new 36 and 35 is 445 nucleotides long, excluding the primers (see 
Figure 9 and SEQ ID NO:9), and has close homology to human calicivirus 
Sapporo and a more distant, yet still significant homology with Norwalk 
10 virus, as shown in Figure 10. Evidence that this Day care cDNA is correct 
is the same as that listed for c-29_4-gel above. 

Use of primers 35 and 36 derived from the Norwalk virus 
sequence to derive a cDNA clone from an animal calicivirus 
A calicivirus was isolated from the mouth of the pygmy chimpanzee, 
15 Pan paniscus. This calicivirus is antigenically distinct from the human 
calicivirus Sapporo strain by EIA. A cDNA was produced from the 
primate calicivirus (PrCV) RNA using RT-PCR and primers 36 and 35. 
The complete nucleotide sequence of this cDNA is not yet available. The 
cDNA, called atprcvw2 (Figure 20; SEQ. ID. NOS 13 and 14), is of the 
20 predicted size and has significant nucleotide homology with human 
cahcivirus Sapporo, feline calicivirus(es), and the rabbit calicivirus in the 
region of known sequence. No significant homology with Norwalk virus 
has been observed in the region of known sequence. The known amino 
acid sequence contains the YGDD motif on the positive strand at the 
25 predicted distance from primer 35. 

Use of multiple primers form the Norwalk virus genomic sequence to 
detect and characterize KY89. another small round virus associated 
with an outbreak of gastroenteritis. 
The known sequence for Norwalk virus is used to obtain the 
30 sequence of other viruses such as SRSV/KY/89, an agent from a stool from 
an outbreak of gastroenteritis in Japan in 1989. Originally, cDNA 
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products and sequence information were obtained using primer sets 36-35. 
Continued work with another 8 sets of primers (Primers 56 and 23, 42 
and 55, 58 and 59, 60 and 61, 72 and 63, 76 and 77, 64 and 75, and 74 and 
3 in Table 6 and SEQ ID NOS:21 through 36) allowed the SRSV/KY/89 
5 sequence of 2516 nucleotides to be determined (Figures 11 and 12, SEQ 
ID NO:12), This sequence includes the part of the polymerase region and 
the capsid region of the genome. Figures 14 and 6 (SEQ ID NOS 38 
through 50 and 63 through 75) show sequences from other Norwalk- 
related viruses. Continued use of this approach with other Norwalk- 

10 related viruses (such as those shown in Table 7) allows the discovery of 
the complete sequences of multiple Norwalk-related viruses. Those skilled 
in the art will realize that the use of such sequence information and 
expression of fragments and derivatives of Norwalk-related viruses 
permits development of diagnostic assays to detect antibodies, antigens, 

15 viral genetic material or antivirals and to develop vaccines for specific 
Norwalk-related viruses in the same manner that Norwalk virus 
fragments and derivatives have been used. 

Example 8 

Development of diagnostic assays using expressed Norwalk 
20 virus proteins to detect immune responses to Norwalk virus 

Protein(s) encoded in the Norwalk virus genome or fragments or 
derivatives thereof is produced in a prokaryotic or eukaryotic expression 
system and used as antigens in diagnostic assays to detect immune 
responses following virus infections. Prokaryotic hosts may include Gram 
25 negative as well as Gram positive bacteria, such as Escherichia coli, 
Salmonella tymphimurium, Serratia marcescens, Bacillus subtilis, 
Staphylococcus aureus and Streptococcus sanguinis. Eukaryotic hosts 
may include yeast, insect or mammalian cells. Diagnostic assays may 
include many formats such as enzyme-linked immunosorbent assays, 
30 radioimmunoassays, immunoblots or other assays. Figure 15 shows data 
for a capsid protein encoded from the 3'-end of the Norwalk virus genome. 
It is expressed by nucleotides 5337 through 7753 of the DNA sequence 
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shown in Table 2 and Figure 8. This protein has an approximate 
molecular weight of 58,500 and is hereinafter referred to as the 58,500 
mwt protein. It was produced in insect cells infected with baculovirus 
recombinants (C-6 and C-8). A band (see arrow in Figure 15) representing 
5 the 58,500 mwt protein in C-6 and C-8 infected cells is not seen in insect 
cells infected with wild-type (WT) baculovirus or in mock infected cells. 
Other proteins encoded by Norwalk virus cDNA or fragments or 
derivatives are similarly expressed using baculovirus recombinants and 
other expression systems. 
10 Figure 16 shows data using the 58,500 mwt protein produced using 

the baculovirus expression system to detect immune responses before and 
after infection of volunteers with Norwalk virus inoculum. Antigen was 
put on ELISA plates and pre- and post-infection human sera were added. 
The data show that when an individual has had the infection, the post- 
15 serum reacts strongly to the antigen. Other proteins encoded in the 
Norwalk virus cDNA or fragments or derivatives thereof are similarly 
used to detect immune responses following Norwalk virus infection. 

Some proteins have the intrinsic property of being able to form 
particles. The 58,500 mwt protein discussed above has that property. 
20 Particles formed from proteins are expressed in any expression system 
and used to produce diagnostic assays based on detection of antibody 
responses or immune responses. Figure 17 shows an electron micrograph 
of particles produced using the baculovirus expression system from 
recombinants containing the 3'-end of the Norwalk genome. These 
25 particles are similar in size to the native virus particles. They are 
antigenic, immunoreactive and immunogenic. They differ from most of 
the virus particles resulting from natural infection in that many of the 
expressed particles lack nucleic acids. The rNV particles are highly 
immunogenic when given parenterally to mice, rabbits and guinea pigs 
30 and when given orally to mice. 

Figure 18 shows data on the properties of rNV particles following 
centrifugation in gradients of CsCl. The density of the particles 
(symbolized by closed boxes) is 1.31 g/cc which is distinct from the 1.38 
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g/cc density of particles purified from the original infectious Norwalk 
inoculum given to volunteers. The gradients were fractionated. Each 
fraction was put on an ELISA plate and human serum was then 
introduced. The open boxes show that there was no ELISA activity with 
5 the pre-infection serum. The closed diamonds show there was reactivity 
with the post-infection serum. Other particles made from other proteins 
encoded in the Norwalk virus cDNA or fragments or derivatives thereof 
are similarly used to detect immune responses following Norwalk virus 
infection. 

10 Figure 19 shows data using purified particles formed by the 58,500 

mwt protein to detect immune responses in post-inoculation (but not pre- 
inoculation) serum samples of 9 volunteers infected with Norwalk virus. 
One of the volunteers, number 6, exhibited no symptoms of Norwalk virus 
infection based on monitoring clinical symptoms or measuring an immune 

15 response. Purified, expressed particles were put on ELISA plates and one 
pre- and one post-infection serum samples from each volunteer was added 
to the particles. The amount of antibody binding to the particles in each 
pre- and post-infection sample was measured. The data in Figure 19 show 
that the expressed proteins form particles that are immunoreactive and 

20 antigenic. Other proteins encoded in the Norwalk virus cDNA or 
fragments or derivatives thereof are similarly used to detect 
immunoreactive and antigenic activity. 

Additional developments of diagnostic assays for the detection of 
Norwalk and Norwalk-related viruses also were pursued. First, new 

25 ELISA assays were made based on utilizing the Norwalk virus capsid 
protein that was engineered to be synthesized from a cDNA fragment that 
was deduced from the Norwalk virus cDNA sequence and then produced 
using the baculovirus expression system. This expressed Norwalk virus 
capsid protein self-assembled into recombinant Norwalk virus particles 

30 (rNV). Two new ELISA assays were established using this rNV antigen. 
One assay detects antiviral antibody and the other detects viral antigen. 
Both the ELISAs are very sensitive when compared to the previous assays 
(based on reagents from human volunteers) available to detect such 
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agents. Further characterization of the antibody ELISA has shown this 
assay detects immune responses following human infections with Norwalk 
virus and a subset of human infections with viruses in the Norwalk group 
such as Snow Mountain and Hawaii agents. In contrast, the antigen 
5 ELISA is based on use of hyperimmune serum made to the baculovirus 
expressed recombinant Norwalk virus particles (rNV). This antigen 
ELISA has been found to be very specific in that is recognizes the 
prototype Norwalk virus (8FIIa) and a subset of closely related agents, but 
not all other viruses in the Norwalk group such as the Snow Mountain 

10 agent and Hawaii agent (See Tables 1 and 7). While the antigen ELISA 
does not detect other viruses in the Norwalk group such as the small 
round structured viruses or.caliciviruses, these and other Norwalk-related 
viruses have been able to be detected using primers selected from the 
nucleotide sequence of Norwalk virus (See Table 7). 

15 To develop more broadly reactive diagnostic assays, ELISAs based 

on using other fragments of the Norwalk virus genome were developed. 
The new diagnostic assays are based on detection of antibody responses 
or of antigens deduced from fragments of the Norwalk virus genome other 
than the capsid region. An example and data of this approach is the 

20 following. 

One Norwalk virus nonstructural protein is predicted to be encoded 
in the first ORF of Norwalk viral genome. This ORF is located at the 5 
end of the viral genome and it has a predicated molecular weight of 
190,000 (190K). Whether this ORF 1 is useful in diagnostic assays first 

25 was evaluated by expressing the protein encoded in the full length viral 
RNA, and then synthesizing and testing the immunoreactivity of the 
encoded protein using a cell-free system. This was accomplished by in 
vitro transcription of a full length cDNA (pGNV-F) of the Norwalk viral 
genome cDNAs. This full-length cDNA was constructed by ligation of 

30 subgenomic derivatives of the original Norwalk virus cDNAs shown in the 
physical map in Figure 5. The in vitro synthesized NV mRNAs next were 
examined for their ability to direct the synthesis of a Norwalk virus 
specific protein by cell-free translation in rabbit reticulocyte lysates in the 



WO 94/05700 



PCT/US93/08447 



33 

presence of methionine to produce a radiolabeled protein. The 
expressed proteins were analyzed by polyacrylamide gel electrophoresis 
(PAGE). A clear band of approximate molecular weight of 130,000 was 
observed in the sample containing the viral RNA but not in the negative 
5 control (without viral RNA). The immunoreactivity of this protein was 
examined by reactivity with pre- and post-infection sera from volunteers 
given Norwalk virus. The 130K protein was precipitated by a 
convalescent serum of a volunteer infected with Norwalk virus, but not by 
serum collected before infection, indicating this protein was virus-specific. 

10 This showed this 130K protein contains some immunoreactive epitopes. 
The apparent smaller size of the protein made in this translation system 
suggested that either the protein migrates aberrantly on gels, or an 
internal initiation codon was used to begin translation or some type of 
post translational modification may have occurred after the protein was 

15 translated. 

To further characterize immunoreactive derivatives of the Norwalk 
virus cDNA useful for diagnostic assays, the 2C region of the Norwalk 
viral genome (see Figure 8) was expressed using the baculovirus 
expression system. This region was selected for initial expression because 

20 it is located at the 5'-end of the non-structured protein and a high level of 
conservation was found between the sequence of the predicted Norwalk 
virus protein, and new sequence published for related caliciviruses and 
picornavirus. A 5'-end cDNA fragment of the viral genome was subcloned 
into the baculovirus transfer vector pVL 1393. After co-transfection of 

25 insect Sf9 cells with wild-type baculovirus DNA, recombinants containing 
the Norwalk viral gene were identified and selected. After three rounds 
of plaque purification, radiolabeled lysates of recombinant-infected insect 
cells were prepared, and the radiolabeled proteins were analyzed by 
PAGE. The results showed that a protein of apparent molecular weight 

30 of 57,000 (57K) was made in recombinant-infected but not in uninfected 
cells. The size of the protein suggested that the internal AUG initiation 
codon located at nucleotide 953 was used for making this protein. This 
57K protein also was precipitated by convalescent serum (but not by pre- 
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infection serum) from a volunteer who was infected with Norwalk virus. 
This protein mainly remained cell-associated. One skilled in the art will 
readily see that improvements in the yield and purification of this 2C 
nonstructural protein are possible and will yield more rapid ELISAs to 
5 detect Norwalk and related virus infections. One skilled in the art also 
will see that by expressing proteins from other regions of the Norwalk 
viral genome (e.g., 3C-like, 3D-like and the 3d ORF), diagnostic assays are 
made for Norwalk and related viruses similar to the ELISAs made with 
the 2C nonstructural and rNV structural protein. These new assays 

10 should widen the spectrum in detection of Norwalk-related viruses. 

The initial lack of sensitive methods to detect Norwalk and Norwalk- 
related viruses made the description of the many Norwalk-related viruses 
difficult to define. However, as shown in Table 7, the methods and data 
provided here demonstrate how the discovery of the nucleotide sequence 

15 of the Norwalk virus genome has led to the ability to develop tests to 
detect Norwalk virus and other related agents. The data and methods 
also demonstrate that fragments and derivatives of the Norwalk virus 
genome can be used to provide evidence of and immunity against Norwalk 
and related viruses. 

20 Example 9 

Development of diagnostic assays using expressed 
Norwalk virus and Norwalk-related viruses to detect viral antigens 
Individual proteins, particles or protein aggregates formed from 
expression of one or more Norwalk virus genes in any prokaryotic or 
25 eukaryotic expression system are used as an immunogen or inoculate 
animals to produce polyclonal and monoclonal antibodies for diagnostic 
assays to detect viral antigens. 

Recombinant Norwalk virus particles (rNV) produced using the 
baculovirus expression system has been used to produce polyclonal 
30 antibodies in mice, guinea pigs and rabbits following parenteral 
immunization (see Table 9). Mice given rNV orally also have developed 
serum antibodies. Hybridomas from mice immunized with rNV also have 
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been obtain following fusion with myeloma cells. Use of these antibodies 
in a capture ELISA has shown NV antigen can be detected. This antigen 
ELISA based on the antiserum made to the rNV particles is quite specific 
and it detects only a subset of Norwalk-related viruses (See Table 7). 
5 Therefore, additional capsid antigens from other Norwalk-related viruses 
(such a Snow Mountain, Hawaii etc.) must be expressed to produce a more 
broadly reactive ELISA for capsid antigen. The ELISA is only one format 
that can be used to detect virus antigen. Other formats could include 
immunofluorescence or immunocytochemistry, or immune electron 

10 microscopy. The comparison of the capsid sequences of Norwalk virus and 
Norwalk-related viruses permits the identification of conserved regions of 
the capsid protein and use of fragments of such sequences to immunize 
animals and can result in the production of antisera with more broad 
reactivity to Norwalk-related viruses. Alternatively, sequential 

15 immunization of animals with expressed proteins of Norwalk and 
Norwalk-related viruses will result in antiserum with the desired broad 
reactivity. Antigen detection assays that are specific to one of a few 
strains of Norwalk and Norwalk-related viruses and additional assays that 
are more broadly reactive each will have use. 

20 Expression of fragments of proteins encoded in other regions of the 

genome can be used to produce antiserum to other proteins for use in 
ELISAs to detect viral antigens. The expression of the first ORF that 
represents a polyprotein encoded in the 5*-end of the genome and 
fragment 2C of the polyprotein has shown that each of these 

25 nonstructural proteins in immunoreactive and antiserum made to these 
can be used to develop diagnostic assays to detect these viral proteins. 
These assays can be broadly reactive and detect many other Norwalk- 
related viruses because of sequence conservation. Those skilled in the art 
will recognize that knowledge of the genome organization of Norwalk 

30 virus permits similar expression of the same regions of the genomes of 
other Norwalk-related viruses for use in diagnostic assays to detect viral 
antigens. 
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Example 10 
Development of a vaccine using 
Norwalk virus expressed antigens 
Vaccines for Norwalk virus, the Norwalk group of viruses or other 
5 small round viruses are made from an expressed Norwalk virus protein. 
That expressed protein can be a Norwalk virus capsid protein expressed 
alone or in combination with one or more other Norwalk virus proteins 
or self-forming particles. For example, the particles shown in Figure 17 
were produced using the baculovirus expression system. They are used as 
10 a vaccine when expressed alone or in combination with one or more other 
Norwalk virus proteins. Similarly, the other proteins encoded in the 
Norwalk virus cDNA or fragments or derivatives thereof are used as a 
vaccine when expressed alone or in combination with one or more 
Norwalk virus proteins. 
15 Individuals are vaccinated orally, parenterally or by a combination 

of both methods. For parenteral vaccination, the expressed protein is 
mixed with an adjuvant and administered in one or more doses in 
amounts and at intervals that give maximum immune response and 
protective immunity. Oral vaccination parallels natural infection by 
20 Norwalk virus inoculum, i.e. the individual ingests the vaccine with 
dechlorinated water or buffer. Oral vaccination may follow sodium 
bicarbonate treatment to neutralize stomach activity. For example, 
sodium bicarbonate solution is taken by each person 2 minutes before and 
5 minutes after vaccine administration. 

25 Example 11 

Production of a vaccine for other agents bv using 
expressed Norwalk virus capsids as a carrier or vehicle 
for the expression of other antigens 
or parts of other antigens 
30 Identification of the region of the genome that encodes the Norwalk 

virus capsid protein and that forms particles following expression (i.e., 
regions 5346 through 6935 and 5337 through 7753) allows genetic 
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engineering of the cDNA that encodes the capsid protein to incorporate 
one or more heterologous pieces of cDNA that encode antigenic epitopes. 
Expression of such recombinant genes produces a recombinant capsid that 
is antigenic, induces antibodies, and protects against Norwalk virus and 
5 its antigens, and against the heterologous epitopes or antigens. 

Alternatively, the Norwalk virus capsid protein carrier is mixed with 
or covalently linked to one or more heterologous protein antigens or 
synthetic peptides containing heterologous epitopes. This mixture is 
antigenic, induces antibodies, and protects against Norwalk virus and its 
10 antigens, and against the heterologous epitopes or antigens. 

Individuals are vaccinated using the oral and parenteral methods 
described above in example 10. 

Example 12 
Kit 

15 Kits for detecting immune responses to Norwalk virus and are 

prepared by supplying in a container a protein deduced from the Norwalk 
virus genome shown in Table 2 or fragments or derivatives thereof. 
Similar proteins are prepared from Norwalk-related viruses to detect 
immune responses to the Norwalk-related viruses. For example, the 

20 protein encoded by Norwalk virus nucleotides 1 through 7753, the protein 
encoded by Norwalk virus nucleotides 146 through 5359, the protein 
encoded by Norwalk virus nucleotides 5337 through 7573, the protein 
encoded by Norwalk virus nucleotides 5346 through 6935, the protein 
encoded by Norwalk virus nucleotides 6938 through 7573 and any 

25 combinations thereof may be used in such kits. The kit can also include 
controls for false positive and false negatives, reagents and sample 
collection devices. The kit can be equipped to detect one sample or 
multiple samples. 
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Example 13 
Kit 

Kits for detecting Norwalk viruses and Norwalk-related viruses are 
prepared by supplying in a container at least one antiserum made from a 
5 protein expressed from the deduced amino acid sequence of the Norwalk 
virus genome shown in Tables 3, 4, or 5 or from a fragment or derivative 
the deduced amino acid sequence. Similar antiserum are made from 
proteins encoded by Norwalk-related viruse genomes. For example, an 
antiserum made to the protein encoded by Norwalk virus nucleotides 1 

10 through 7753, the protein encoded by Norwalk virus nucleotides 146 
through 5359, the protein encoded by Norwalk virus nucleotides 5337 
through 7573, the protein encoded by Norwalk virus nucleotides 5346 
through 6935, the protein encoded by Norwalk virus nucleotides 6938 
through 7573 and any combination thereof may be used in such kits. The 

15 kit can also include controls for false positives and false negatives, 
reagents and sample collection devices. The kit can be equipped to detect 
one sample or multiple samples. 

In conclusion, it is seen that the present invention and the 
embodiments disclosed herein are well adapted to carry out the objectives 

20 and attain the ends and advantages mentioned as well as other inherent 
therein. The novel features characteristic of this invention are set forth 
in the appended claims. While presently preferred embodiments of the 
invention have been described for the purpose of disclosure, numerous 
changes in the details of synthesis and use described herein will be 

25 apparent to those skilled in the art. It should be understood, however, 
that there is no intention to limit the invention to the specific form 
disclosed, but on the contrary, the intention is to cover all modifications, 
alternative means of synthesis and use and equivalents falling within the 
spirit and scope of the invention. 
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Table 2. The nucleotide sequence of Norwalk virus genome. 



GGCGTCAAAA 


GACGTCGTTC 


CTACTGCTGC 


TAGCAGTGAA 


AATGCTAACA 


ACAATAGTAG 


60 


TATTAAGTCT 


CGTCTATTGG 


CGAGACTCAA 


GGGTTCAGGT 


GGGGCTACGT 


CCCCACCCAA 


120 


CTCGATAAAG 


ATAACCAACC 


AAGATATGGC 


TCTGGGGCTG 


ATTGGACAGG 


TCCCAGCGCC 


180 


AAAGGCCACA 


TCCGTCGATG 


TCCCTAAACA 


ACAGAGGGAT 


AGACCACCAC 


GGACTGTTGC 


240 


CGAAGTTCAA 


CAAAATTTGC 


GTTGGACTGA 


GAGACGAGAA 


GACCAGAATG 


TTAAGACGTG 


300 


GGATGAGCTT 


GACCACACAA 


CAAAAGAACA 


GATACTTGAT 


GAACACGCTG 


AGTGGTTTGA 


360 


TGCCGGTGGC 


TTAGGTCCAA 


GTACACTACC 


CACTAGTCAT 


GAACGGTACA 


CACATGAGAA 


420 


TGATGAAGGC 


CACCAGGTAA 


AGTGGTCGGC 


TAGGGAAGGT 


GTAGACCTTG 


GCATATCCGG 


480 


GCTCACGACG 


GTGTCTGGGC 


CTGAGTGGAA 


TATGTGCCCG 


CTACCACCAG 


TTGACCAAAG 


540 


GAGCACGACA 


CCTGCAACTG 


AGCCCACAAT 


TGGTGACATG 


ATCGAATTCT 


ATGAAGGGCA 


600 


CATCTATCAT 


TATGCTATAT 


ACATAGGTGA 


AGGGAAGACG 


GTGGGTGTAC 


ACTCCCCTCA 


660 


AGCAGCCTTC 


TCAATAACGA 


GGATCACCAT 


ACAGCCCATA 


TCAGCTTGGT 


GGOGAGTCTG 


720 


TTATGTCCCA 


CAACCAAAAC 


AGAGGCTCAC 


ATACGACCAA 


CTCAAAGAAT 


TAGAAAATGA 


780 


ACCATGGCCG 


TATGCCGCAG 


TCACGAACAA 


CTGCTTCGAA 


TTTTGTTGCC 


AGGTCATGTG 


840 


CTTGGAAGAT 


ACTTGGTTGC 


AAAGGAAGCT 


CATCTCCTCT 


GGCCGGTTTT 


ACCACCCGAC 


900 


CCAAGATTGG 


TCCCGAGACA 


CTCCAGAATT 


CCAACAAGAC 


AGCAAGTTAG 


AGATGGTTAG 


960 


GGATGCAGTG 


CTAGCCGCTA 


TAAATGGGTT 


GGTGTCGCGG 


CCATTTAAAG 


ATCTTCTGGG 


1020 


TAAGCTCAAA 


CCCTTGAACG 


TGCTTAACTT 


ACTTTCAAAC 


TGTGATTGGA 


CGTTCATGGG 


1080 


GGTCGTGGAG 


ATGGTGGTCC 


TCCTTTTAGA 


ACTCTTTGGA 


ATCTTTTGGA 


ACCCACCTGA 


1140 


TGTTTCCAAC 


TTTATAGCTT 


CACTCCTGCC 


AGATTTCCAT 


CTACAGGGCC 


CCGAGGACCT 


1200 


TGCCAGGGAT 


CTCGTGCCAA 


TAGTATTGGG 


GGGGATCGGC 


TTAGCCATAG 


GATTCACCAG 


1260 


AGACAAGGTA 


AGTAAGATGA 


TGAAGAATGC 


TGTTGATGGA 


CTTCGTGCGG 


CAACCCAGCT 


1320 


CGGTCAATAT 


GGCCTAGAAA 


TATTCTCATT 


ACTAAAGAAG 


TACTTCTTCG 


GTGGTGATCA 


1380 


AACAGAGAAA 


ACCCTAAAAG 


ATATTGAGTC 


AGCAGTTATA 


GATATGGAAG 


TACTATCATC 


1440 


TACATCAGTG 


ACTCAGCTCG 


TGAGGGACAA 


ACAGTCTGCA 


CGGGCTTATA 


TGGCCATCTT 


1500 


AGATAATGAA 


GAAGAAAAGG 


CAAGGAAATT 


ATCTGTCAGG 


AATGCCGACC 


CACACGTAGT 


1560 


ATCCTCTACC 


AATGCTCTCA 


TATCCCGGAT 


CTCAATGGCT 


AGGGCTGCAT 


TGGCCAAGGC 


1620 


TCAAGCTGAA 


ATGACCAGCA 


GGATGCGTCC 


TGTGGTCATT 


ATGATGTGTG 


GGCCCCCTGG 


1680 


TATAGGTAAA 


ACCAAGGCAG 


CAGAACATCT 


GGCTAAACGC 


CTAGCCAATG 


AGATACGGCC 


1740 


TGGTGGTAAG 


GTTGGGCTGG 


TCCCACGGGA 


GGCAGTGGAT 


CATTGGGATG 


GATATCACGG 


1800 


AGAGGAAGTG 


ATGCTGTGGG 


ACGACTATGG 


AATGACAAAG 


ATACAGGAAG 


ACTGTAATAA 


1860 


ACTGCAAGCC 


ATAGCCGACT 


CAGCCCCCCT 


AACACTCAAT 


TGTGACCGAA 


TAGAAAACAA 


1920 


GGGAATGCAA 


TTTGTGTCTG 


ATGCTATAGT 


CATCACCACC 


AATGCTCCTG 


GCCCAGCCCC 


1980 


AGTGGACTTT 


GTCAACCTCG 


GGCCTGTTTG 


CCGAAGGGTG 


GACTTCCTTG 


TGTATTGCAC 


2040 


GGCACCTGAA 


GTTGAACACA 


CGAGGAAAGT 


CAGTCCTGGG 


GACACAACTG 


CACTGAAAGA 


2100 


CTGCTTCAAG 


CCCGATTTCT 


CACATCTAAA 


AATGGAGTTG 


GCTCCCCAAG 


GGGGCTTTGA 


2160 


TAACCAAGGG 


AATACCCCGT 


TTGGTAAGGG 


TGTGATGAAG 


CCCACCACCA 


TAAACAGGCT 


2220 


GTTAATCCAG 


GCTGTAGCCT 


TGACGATGGA 


GAGACAGGAT 


GAGTTCCAAC 


TCCAGGGGCC 


2280 


TACGTATGAC 


TTTGATACTG 


ACAGAGTAGC 


TGCGTTCACG 


AGGATGGCCC 


GAGCCAACGG 


2340 


GTTGGGTCTC 


ATATCCATGG 


CCTCCCTAGG 


CAAAAAGCTA 


CGCAGTGTCA 


CCACTATTGA 


2400 


AGGATTAAAG 


AATGCTCTAT 


CAGGCTATAA 


AATATCAAAA 


TGCAGTATAC 


AATGGCAGTC 


2460 


AAGGGTGTAC 


ATTATAGAAT 


CAGATGGTGC 


CAGTGTACAA 


ATCAAAGAAG 


ACAAGCAAGC 


2520 
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Table 2, continued 



TTTGAPPPPT 
x x x wav^v^^*^ x 


P.TGP.AGPAGA 

w> X \9 w AVJ VAUA 


PAATTAAPAP 


GGOOTPAPTT 
VjVjwwX wAwX X 


OOOIiTOZV OTO 
IjwwAX wAwX v 


O IV PT P A.ZV IV tl P 
V Aw X wAAAl? w 




AGCTAGGGCT 


GTGGPATAPG 

V X WV«A X AwV 


PTTPZiTCTTT 
w X 1UA1V1 X X 


wwAV X wUV/^ 


ATA APT A PP A 

AXAAwXAwwA 


T ft OT IV O IV IV IV T 
XAwXAwAAAX 


Z OfiU 


GG P.GGG AT C T 


GCGCTCGTTA 

w wW wX wVJ X X A 


TT A A TPG AGP 

X X AA X wVJJAVJ w 


GGTP A AGPGT 

W\> X vAAUvV X 


ATGTTTGGTA 


PPPGTAPAGP 

ww LV> X AwAOw 




AGPPATGGCA 


TTAGAAGGAP 

X X AUAAUV>A^« 


PTGGGft A AG A 

w X ww VAAAwA 


AP AT A ATTGP 
A wA X AA X X v w 


ivn^v TPP1VT1V 

AVIUU X w wA X A 


Aw w X AA wAj A 


Z / Ow 


AGCTGGAAAG 
*»w v«f x w aaav? 


GGGCCP.ATAG 

ww VJ w w wA X Aw 


GTP A TG ATG A 

w X vn X VJA X UA 


PATGGT AGA A 
wA X Vtv IAuAA 


iOOTTTOOOP 

A\JVJ XXX wAJ w w 


TATfi.Tfi.A A AP 
lAluluAAAV* 


z oz u 


TGAAGAGGAG 
x vaava w Aw 


GAGAGTGAGG 
w a va v x vjnvyvj 


ACCAAATTP.A 
nvwinni x v^a 


AATGGTACCA 

AA X VjVJ X A w wA 


AGTGATGPPG 
AVJ X vA X tfV#Uj 


TPPPAGA AGG 
X wwwAVjAAVjvj 


Z OO VJ 


AAAGAACAAA 


GGCAAGACCA 


AAAAGGGACG 


TGGTCGCAAA 

X Ww X vww&uul 


AATAACTATA 

AAX AAw X AX A 


ATGPATTPTP 

A X w wA X X w- X w 


2940 


TCGCCGTGGT 


CTGAGTGATG 


AAGAATATGA 


AGAGTACAAA 

A V A V X A wAAA 


AAGATCAGAG 
aa v a x vnunu 


AAGAAAAGAA 

AAV? AAAA wAA 


w UvU 


TGGCAATTAT 


AGTATACAAG 


AATACTTGGA 

«U* X civ X X W*l 


GGACCGCCAA 

W A www V* wAA 


P.GATATGAGG 

wVJA X A X VJAUU 


AAGAATTAGP 

AAV7AAX X AV7 w 


W V/ w w 


AGAGGTACAG 


GCAGGTGGTG 

wwAWX WX V 


ATGGTGGCAT 

AX ww X WWwAX 


AGGAGAAAPT 

AUUAUAAAV/ X 


GAAATGGAAA 
VAAAluuAAA 


TPPGTPAPAfi 

X www* X wAwAw 


it on 

J1ZU 


GGTCTTCTAT 

W W X W X XwXflX 


AAATCCAAGA 


GTAAGAAAPA 

w X AAV AAA wA 


P.CAAP.AAGAG 

w wAA wAA w A w 


P A APGG PG A P 

wAA w\7 w w V A w 


ft ft PTT/S/STPT 
AAwl X V7\7 1 w X 


^ 1 AO 
w lOU 


AGTGACTGGA 

«»w x wnw x wn 


TCAGACATCA 

X wtlvflw/l X wA 


GAAAAPGTAA 

Un/uUlvV7 X AA 


GPPPATTftAP 
WwwwAX X wAw 


X uuALv www w 


PAZV AflA ATHA 
wAAAwAA X vj A 


<3Z4U 


ATGG GCAG AT 
«» x w w waw a x 


GATGACAGAG 

VJA X \XA w A\9 AV7 


AGGTGGATTA 

AV7V7 X UvA X XA 


TAATGAAAAG 

X AA X Aff AAAA \9 


AlvAAl 111b 


IVlVfiPTPPPPP 

AAU W X WW WW w 


OJUU 


GACACTATGG 
wA wa w x ax w 


AGPPGAGTPA 


P A A A GTTTfiP. 

wAAAW XXX VJV» 


ATO A/IO ATtflfS 
A X WAvuA luu 


VjuL X X X X uu \» 


X wAu w w wGAw 




AGTGTTPATP 
a v x v x x wA x w 


APAAPPAPAP 


AluiAuluLL 


A A O TOO TO TP 


ft ft ft O IV IV TTfT 

AAAGAATTwT 


X X GGT G AG C C 


w4zU 


PPTATPTttfiT 


IVTZVCP^ATPP 
AlAuLAAl ww 


lv o o iv iv o o iv oo 
ALLAAuwAuVj 


TO ft O TT/* IV O » 


n ft iv hh n^i ft t 
CAATTCAGGT 


fT*l »<|IM TV IV TV M TV 7V 

TCTCAAAGAA 


3480 


A A Tfi OP PP PT 
AA X V» wly ww w X 


PIVPTTP1VP1VP 
ur A w x 1 unwnu 


ulAluvlttl 


T/*ft ft^ft ft^f^T 


T/* ^iOOT/* IV ft f+ 

TGCCCTGAAG 


M M TV M TV M ffl^fflO 

GGACAGT CTG 


3540 


w 1 Lnu X X a 


* ipqifi* * fTMMM 


Ax xCGGGTGA 


ACTACTTCCG 


CTAGCCGTCC 


GTATGGGGGC 


3600 


TIVTTPPPTPP 


a to iv rpiiip* m 
a 1 unuun 1 Au 


AViGGTCGGCT 


hi/ *(n^^ w ii i^i m m 
TGTCCATGGC 


CAATCAGGGA 


TGTTACTGAC 


3660 




ooiv sipwrfc 
v Unnnu (Malar a 


TGGATCTTGG 


CACTATACCA 


GGAGACTGCG 


M MM MTV MM TV fTlTV 

GGGCACCATA 


3720 


PGTPPAPAaCS 
wV» 1 w vAvAnu 


POPOOO. II TA TP 
UuLwuiui X 1? 


IV OTOOOTTOT 

ALluubl xl*x 


wTGTGGAGTC 


CACG CTG C AG 


n/IK ^TV TV TV MR1M] 

CCACAAAGTC 


<Vain/\ 

3780 


AnnPA apapp 


P T/tlP TP TfSPtf2 


PTPTitpappp 


T/*^ IvO ft f^^r* 


GAAACCGCAC 


TV M TV TV M^IIIIMM 

TAGAAGGTGG 


"5 O A e\ 

3840 


AGAPAAGHGG 


wAX lAlvtUu 


fiPPIVPPlVP IVT 


TO TO ft frtfT* IV T 

TGxGAGGTAT 


GGAAGTGGCC 


M.TVMMTV MmMtTTM 

CAG CACTGTC 


"3 Ann 


A APT A A A AP A 

AAV/ X AAAAwA 


ftlVftTTPTPOlV 

AAA X 1 w XAj wA 


W 1 WW 1 WW WW 


ft ^ft ft r^^lft OfTV"* 

AGaACCACTG 


CCCCCCGGAG 


T ATATGAG CC 


^ ft £ rv 
3960 


AGPATAPPTG 
au wA x nvv x vv 


GGGGGPAAflfl 


IVPPPPPCTPT 

A ww ww wV X V? X 


ACAGAATGGC 


CCATCCCTAC 


ft ft MTV M Mm TV Mm 

AACAGGTACT 


4020 


APGTGAPPAA 

A wV7 X VlAV/WlA 


PTGAA APPPT 
w X UAAAV/UL X 


X X V» LOVjA ww w 


CCGCGGCCGC 


ATGCCTGAGC 


CTGGCCTACT 


4080 


GGAGGPTGPG 


GTTGAGAPTfS 


TA APATPPAT 
X AA wA X w wA X 


GTTAGAACAG 


ACAATGGATA 


MM MM TV TV M MMM 

wwwCAAGCCC 


4140 


GTOOTPTTZVP 


V> w 1 i» A X \9 ww X 


VwwAAX wX wX 


TGACAAAACT 


ACTAGTTCGG 


MMm^V MM/ tlllM^V 

GGTACCCTCA 


il n Aft 

4200 


PPATAAAAGP. 

wwA X AAAAV9VV 


A A P & HTP IV TP 
AAwAAX VAX w 


1VTTOP 1 IV TOP 

AX X wwAAXVw* 


CACCACCTTC 


GTTGGAGAGC 


nrnrrniA TV M M TV 

T CGGTG AGC A 


4260 


ivnpT/iozvpap 

aVj Liu UauaU 




TOTIVTOIVO JV1V 


TGCTAAACAT 


ATGAAACCCA 


TTTACACTGC 


4320 


AGPPTT1V1VIV A 
nuttl lAAAA 


O IV TO IV TV OT IV o 
u a 1 unnL X AG 


TOIV IV O OO IV O IV 

TwaaGwwAGA 


AAAGATTTAT 


CAAAAAGTCA 


AGAAGCGTCT 


4380 


APTATCf2fIf2P 


ftPP/2ATPTPP 
U-UvAvAX wX w\j 


OlVlVOftOTOOT 

uAAUAb x\*\3 x 


CAGGGCCGCC 


CGGGCTTTTG 


GCCCATTTTG 


4440 


TGAPGPTUTA 
X unwv X A X a 


3V.lV2VTPfcPIV.Tf2 
AAA X UALA X V» 


X wAX waAaX X 


GCCAATAAAA GTTGGCATGA 


TV MTV MTV TV til TV M TV 

ACACAATAGA 


4500 


AGATGGPPPP 
Aur A X VfUvU \+\+ 


PTPXVTPTATP 


PTOIVOOIVTO^ 


TAAATATAAG 


AATCATTTTG 


ATGCAGATTA 


4560 


TACAGCATGG 


tv M^n^i tv tv tv ,m 

GACTCAACAC 


AAAATAGACA 


AATTATGACA 


GAATCCTTCT 


CCATTATGTC 


4620 


GCGCCTTACG 


GCCTCACCAG 


AATTGGCCGA 


GGTTGTGGCC 


CAAGATTTGC 


TAGCACCATC 


4680 


TGAGATGGAT 


GTAGGTGATT 


ATGTCATCAG 


GGTCAAAGAG 


GGGCTGCCAT 


CTGGATTCCC 


4740 


ATGTACTTCC 


CAGGTGAACA 


GCATAAATCA 


CTGGATAATT 


ACTCTCTGTG 


CACTGTCTGA 


4800 


GGCCACTGGT 


TTATCACCTG 


ATGTGGTGCA 


ATCCATGTCA 


TATTTCTCAT 


TTTATGGTGA 


4860 


TGATGAGATT 


GTGTCAACTG 


ACATAGATTT 


TGACCCAGCC 


CGCCTCACTC 


AAATTCTCAA 


4920 


GGAATATGGC 


CTCAAACCAA 


CAAGGCCTGA 


CAAAACAGAA 


GGACCAATAC 


AAGTGAGGAA 


4980 


AAATGTGGAT 


GGACTGGTCT 


TCTTGCGGCG 


CACCATTTCC 


CGTGATGCGG 


CAGGGTTCCA 


5040 


AGGCAGGTTA 


GATAGGGCTT 


CGATTGAACG 


CCAAATCTTC 


TGGACCCGCG 


GGCCCAATCA 


5100 
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Table 2, continued 



TTCAGATCCA 


TCAGAGACTC 


TAGTGCCACA 


CACTCAAAGA 


AAAATACAGT 


TGATTTCACT 


5160 


TCTAGGGGAA 


GCTTCACTCC 


ATGGTGAGAA 


ATTTTACAGA 


AAGATTTCGA 


GCAAGGTCAT 


5220 


ACATGAAATC 


AAGACTGGTG 


GATTGGAAAT 


GTATGTCCCA 


GGATGGCAGG 


CCATGTTCCG 


5280 


CTGGATGCGC 


TTCCATGACC 


TCGGATTGTG 


GACAGGAGAT 


CGCGATCTTC 


TGCCCGAATT 


5340 


CGTAAATGAT 


GATGGCGTCT 


AAGGACGCTA 


CATCAAGCGT 


GGATGGCGCT 


AGTGGCGCTG 


5400 


GTCAGTTGGT 


ACCGGAGGTT 


AATGCTTCTG 


ACCCTCTTGC 


AATGGATCCT 


GTAGCAGGTT 


5460 


CTTCGACAGC 


AGTCGCGACT 


GCTGGACAAG 


TTAATCCTAT 


TGATCCCTGG 


ATAATTAATA 


5520 


ATTTTGTGCA 


AGCCCCCCAA 


GGTGAATTTA 


CTATTTCCCC 


AAATAATACC 


CCCGGTGATG 


5580 


TTTTGTTTGA 


TTTGAGTTTG 


GGTCCCCATC 


TTAATCCTTT 


CTTGCTCCAT 


CTATCACAAA 


5640 


TGTATAATGG 


TTGGGTTGGT 


AACATGAGAG 


TCAGGATTAT 


GCTAGCTGGT 


AATGCCTTTA 


5700 


CTGCGGGGAA 


GATAATAGTT 


TCCTGCATAC 


CCCCTGGTTT 


TGGTTCACAT 


AATCTTACTA 


5760 


TAGCACAAGC 


AACTCTCTTT 


CCACATGTGA 


TTGCTGATGT 


TAGGACTCTA 


GACCCCATTG 


5820 


AGGTGCCTTT 


GGAAGATGTT 


AGGAATGTTC 


TCTTTCATAA 


TAATGATAGA 


AATCAACAAA 


5880 


CCATGCGCCT 


TGTGTGCATG 


CTGTACACCC 


CCCTCCGCAC 


TGGTGGTGGT 


ACTGGTGATT 


5940 


CTTTTGTAGT 


TGCAGGGCGA 


GTTATGACTT 


GCCCCAGTCC 


TGATTTTAAT 


TTCTTGTTTT 


6000 


TAGTCCCTCC 


TACGGTGGAG 


CAGAAAACCA 


GGCCCTTCAC 


ACTCCCAAAT 


CTGGCATTGA 


6060 


GTTCTCTGTC 


TAACTCACGT 


GCCCCTCTCC 


CAATCAGTAG 


TATGGGCATT 


TCCCCAGACA 


6120 


ATGTCCAGAG 


TGTGCAGTTC 


CAAAATGGTC 


GGTGTACTCT 


GGATGGCCGC 


CTGGTTGGCA 


6180 


CCACCCCAGT 


TTCATTGTCA 


CATGTTGCCA 


AGATAAGAGG 


GACCTCCAAT 


GGCACTGTAA 


6240 


TCAACCTTAC 


TGAATTGGAT 


GGCACACCCT 


TTCACCCTTT 


TGAGGGCCCT 


GCCCCCATTG 


6300 


GGTTTCCAGA 


CCTCGGTGGT 


TGTGATTGGC 


ATATCAATAT 


GACACAGTTT 


GGCCATTCTA 


6360 


GCCAGACCCA 


GTATGATGTA 


GACACCACCC 


CTGACACTTT 


TGTCCCCCAT 


CTTGGTTCAA 


6420 


TTCAGGCAAA 


TGGCATTGGC 


AGTGGTAATT 


ATGTTGGTGT 


TCTTAGCTGG 


ATTTCCCCCC 


6480 


CATCACACCC 


GTCTGGCTCC 


CAAGTTGACC 


TTTGGAAGAT 


CCCCAATTAT 


GGGTCAAGTA 


6540 


TTACGGAGGC 


AACACATCTA 


GCCCCTTCTG 


TATACCCCCC 


TGGTTTCGGA 


GAGGTATTGG 


6600 


TCTTTTTCAT 


GTCAAAAATG 


CCAGGTCCTG 


GTGCTTATAA 


TTTGCCCTGT 


CTATTACCAC 


6660 


AAGAGTACAT 


TTCACATCTT 


GCTAGTGAAC 


AAGCCCCTAC 


TGTAGGTGAG 


GCTGCCCTGC 


6720 


TCCACTATGT 


TGACCCTGAT 


ACCGGTCGGA 


ATCTTGGGGA 


ATTCAAAGCA 


TACCCTGATG 


6780 


GTTTCCTCAC 


TTGTGTCCCC 


AATGGGGCTA 


GCTCGGGTCC 


ACAACAGCTG 


CCGATCAATG 


6840 


GGGTCTTTGT 


CTTTGTTTCA 


TGGGTGTCCA 


GATTTTATCA ATTAAAGCCT 


GTGGGAACTG 


6900 


CCAGCTCGGC 


AAGAGGTAGG 


CTTGGTCTGC 


GCCGATAATG 


GCCCAAGCCA 


TAATTGGTGC 


6960 


AATTGCTGCT 


TCCACAGCAG 


GTAGTGCTCT 


GGGAGCGGGC 


ATACAGGTTG 


GTGGCGAAGC 


7020 


GGCCCTCCAA 


AGCCAAAGGT 


ATCAACAAAA 


TTTGCAACTG 


CAAGAAAATT 


CTTTTAAACA 


7080 


TGACAGGGAA 


ATGATTGGGT 


ATCAGGTTGA 


AGCTTCAAAT 


CAATTATTGG 


CTAAAAATTT 


7140 


GGCAACTAGA 


TATTCACTCC 


TCCGTGCTGG 


GGGTTTGACC 


AGTGCTGATG 


CAGCAAGATC 


7200 


TGTGGCAGGA 


GCTCCAGTCA 


CCCGCATTGT 


AGATTGGAAT 


GGCGTGAGAG 


TGTCTGCTCC 


7260 


CGAGTCCTCT 


GCTACCACAT 


TGAGATCCGG 


TGGCTTCATG 


TGAGTTCCCA 


TACCATTTGC 


7320 


CTCTAAGCAA 


AAACAGGTTC 


AATCATCTGG 


TATTAGTAAT 


CCAAATTATT 


CCCCTTCATC 


7380 


CATTTCTCGA 


ACCACTAGTT 


GGGTCGAGTC 


ACAAAACTCA 


TCGAGATTTG 


GAAATCTTTC 


7440 


TCCATACCAC 


GCGGAGGCTC 


TCAATACAGT 


GTGGTTGACT 


CCACCCGGTT 


CAACAGCCTC 


7500 


TTCTACACTG 


TCTTCTGTGC 


CACGTGGTTA 


TTTCAATACA 


GACAGGTTGC 


CATTATTCGC 


7560 


AAATAATAGG 


CGATGATGTT 


GTAATATGAA 


ATGTGGGCAT 


CATATTCATT 


TAATTAGGTT 


7620 


TAATTAGGTT 


TAATTTGATG 


TTAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


7680 
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Table 2, continued 

AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 7740 
AAAAAAAAAA AAA 7753 



WO 94/05700 



PCT/US93/08447 



44 



Table 3* The amino acid sequence deduced from nucleotides 146 through 
5359 of the Norwalk virus genome shown in Table 2. 

CTCGATAAAG ATAACCAACC AAGAT ATG GCT CTG GGG CTG ATT GGA CAG GTC 172 

Met Ala Leu Gly Leu He Gly Gin Val 
1 5 

CCA GCG CCA AAG GCC ACA TCC GTC GAT GTC CCT AAA CAA CAG AGG GAT 220 
Pro Ala Pro Lys Ala Thr Ser Val Asp Val Pro Lys Gin Gin Arg Asp 
10 15 20 25 

AGA CCA CCA CGG ACT GTT GCC GAA GTT CAA CAA AAT TTG CGT TGG ACT 268 
Arg Pro Pro Arg Thr Val Ala Glu Val Gin Gin Asn Leu Arg Trp Thr 
30 35 40 

GAG AGA CCA CAA GAC CAG AAT GTT AAG ACG TGG GAT GAG CTT GAC CAC 316 
Glu Arg Pro Gin Asp Gin Asn Val Lys Thr Trp Asp Glu Leu Asp His 
45 50 55 

ACA ACA AAA CAA CAG ATA CTT GAT GAA CAC GCT GAG TGG TTT GAT GCC 364 
Thr Thr Lys Gin Gin He Leu Asp Glu His Ala Glu Trp Phe Asp Ala 
60 65 70 

GGT GGC TTA GGT CCA AGT ACA CTA CCC ACT AGT CAT GAA CGG TAC ACA 412 
Gly Gly Leu Gly Pro Ser Thr Leu Pro Thr Ser His Glu Arg Tyr Thr 
75 80 85 

CAT GAG AAT GAT GAA GGC CAC CAG GTA AAG TGG TCG GCT AGG GAA GGT 460 
His Glu Asn Asp Glu Gly His Gin Val Lys Trp Ser Ala Arg Glu Gly 
90 95 100 105 

GTA GAC CTT GGC ATA TCC GGG CTC ACG ACG GTG TCT GGG CCT GAG TGG 508 
Val Asp Leu Gly He Ser Gly Leu Thr Thr Val Ser Gly Pro Glu Trp 
110 115 120 

AAT ATG TGC CCG CTA CCA CCA GTT GAC CAA AGG AGC ACG ACA CCT GCA 556 
Asn Met Cys Pro Leu Pro Pro Val Asp Gin Arg Ser Thr Thr Pro Ala 
125 130 135 

ACT GAG CCC ACA ATT GGT GAC ATG ATC GAA TTC TAT GAA GGG CAC ATC 604 
Thr Glu Pro Thr He Gly Asp Met He Glu Phe Tyr Glu Gly His He 
140 145 150 

TAT CAT TAT GCT ATA TAC ATA GGT CAA GGC AAG ACG GTG GGT GTA CAC 652 
Tyr HiB Tyr Ala He Tyr He Gly Gin Gly Lys Thr Val Gly Val His 
155 160 165 

TCC CCT CAA GCA GCC TTC TCA ATA ACG AGG ATC ACC ATA CAG CCC ATA 700 
Ser Pro Gin Ala Ala Phe Ser He Thr Arg lie Thr He Gin Pro He 
170 175 180 185 

TCA GCT TGG TGG CGA GTC TGT TAT GTC CCA CAA CCA AAA CAG AGG CTC 748 
Ser Ala Trp Trp Arg Val Cys Tyr Val Pro Gin Pro Lys Gin Arg Leu 
190 195 200 

ACA TAC GAC CAA CTC AAA GAA TTA GAA AAT GAA CCA TGG CCG TAT GCC 796 
Thr Tyr Asp Gin Leu Lys Glu Leu Glu Asn Glu Pro Trp Pro Tyr Ala 
205 210 215 

GCA GTC ACG AAC AAC TGC TTC GAA TTT TGT TGC CAG GTC ATG TGC TTG 844 
Ala Val Thr Asn Asn Cys Phe Glu Phe Cys Cys Gin Val Met Cys Leu 
220 225 230 
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Table 3 , continued 

GAA GAT ACT TGG TTG CAA AGG AAG CTC ATC TCC TCT GGC CGG TTT TAC 892 

Glu Asp Thr Trp Leu Gin Arg Lys Leu lie Ser Ser Gly Arg Phe Tyr 
235 240 245 

CAC CCG ACC CAA GAT TGG TCC CGA GAC ACT CCA GAA TTC CAA CAA GAC 940 
His Pro Thr Gin Asp Trp Ser Arg Asp Thr Pro Glu Phe Gin Gin Asp 
250 255 260 265 

AGC AAG TTA GAG ATG GTT AGG GAT GCA GTG CTA GCC GCT ATA AAT GGG 988 
Ser Lys Leu Glu Met Val Arg Asp Ala Val Leu Ala Ala lie Asn Gly 
270 275 280 

TTG GTG TCG CGG CCA TTT AAA GAT CTT CTG GGT AAG CTC AAA CCC TTG 1036 
Leu Val Ser Arg Pro Phe Lys Asp Leu Leu Gly Lys Leu Lys Pro Leu 
285 290 295 

AAC GTG CTT AAC TTA CTT TCA AAC TGT GAT TGG ACG TTC ATG GGG GTC 1084 
Asn Val Leu Asn Leu Leu Ser Asn Cys Asp Trp Thr Phe Met Gly Val 
300 305 310 

GTG GAG ATG GTG GTC CTC CTT TTA GAA CTC TTT GGA ATC TTT TGG AAC 1132 
Val Glu Met Val Val Leu Leu Leu Glu Leu Phe Gly lie Phe Trp Asn 
315 320 325 

CCA CCT GAT GTT TCC AAC TTT ATA GCT TCA CTC CTG CCA GAT TTC CAT 1180 
Pro Pro Asp Val Ser Asn Phe lie Ala Ser Leu Leu Pro Asp Phe His 
330 335 340 345 

CTA CAG GGC CCC GAG GAC CTT GCC AGG GAT CTC GTG CCA ATA GTA TTG 1228 
Leu Gin Gly Pro Glu Asp Leu Ala Arg Asp Leu Val Pro lie Val Leu 
350 355 360 

GGG GGG ATC GGC TTA GCC ATA GGA TTC ACC AGA GAC AAG GTA AGT AAG 1276 
Gly Gly lie Gly Leu Ala lie Gly Phe Thr Arg Asp Lys Val Ser Lys 
365 370 375 

ATG ATG AAG AAT GCT GTT GAT GGA CTT CGT GCG GCA ACC CAG CTC GGT 1324 
Met Met Lys Asn Ala Val Asp Gly Leu Arg Ala Ala Thr Gin Leu Gly 
380 385 390 

CAA TAT GGC CTA GAA ATA TTC TCA TTA CTA AAG AAG TAC TTC TTC GGT 1372 
Gin Tyr Gly Leu Glu He Phe Ser Leu Leu Lys Lys Tyr Phe Phe Gly 
395 400 405 

GGT GAT CAA ACA GAG AAA ACC CTA AAA GAT ATT GAG TCA GCA GTT ATA 1420 
Gly Asp Gin Thr Glu Lys Thr Leu Lys Asp He Glu Ser Ala Val He 
410 415 420 425 

GAT ATG GAA GTA CTA TCA TCT ACA TCA GTG ACT CAG CTC GTG AGG GAC 1468 
Asp Met Glu Val Leu Ser Ser Thr Ser Val Thr Gin Leu Val Arg Asp 
430 435 440 

AAA CAG TCT GCA CGG GCT TAT ATG GCC ATC TTA GAT AAT GAA GAA GAA 1516 
Lys Gin Ser Ala Arg Ala Tyr Met Ala He Leu Asp Asn Glu Glu Glu 
445 450 455 

AAG GCA AGG AAA TTA TCT GTC AGG AAT GCC GAC CCA CAC GTA GTA TCC 1564 
Lys Ala Arg Lys Leu Ser Val Arg Asn Ala Asp Pro His Val Val Ser 
460 465 470 

TCT ACC AAT GCT CTC ATA TCC CGG ATC TCA ATG GCT AGG GCT GCA TTG 1612 
Ser Thr Asn Ala Leu He Ser Arg He Ser Met Ala Arg Ala Ala Leu 
475 480 485 
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Table 3, continued 



GCC AAG GCT CAA GCT GAA ATG ACC AGC AGG ATG CGT OCT GTG GTC ATT 
Ala Lys Ala Gin Ala Glu Met Thr Ser Arg Met Arg Pro Val Val lie 
490 495 500 505 



1660 



ATG ATG TGT GGG CCC CCT GGT ATA GGT AAA ACC AAG GCA GCA GAA CAT 
Met Met Cys Gly Pro Pro Gly He Gly Lys Thr Lys Ala Ala Glu His 
510 515 520 



1708 



CTG GCT AAA CGC CTA GCC AAT GAG ATA CGG CCT GGT GGT AAG GTT GGG 
Leu Ala Lys Arg Leu Ala Asn Glu He Arg Pro Gly Gly Lys Val Gly 
525 530 535 



1756 



CTG GTC CCA CGG GAG GCA GTG GAT CAT TGG GAT GGA TAT CAC GGA GAG 
Leu Val Pro Arg Glu Ala Val Asp HiB Trp Asp Gly Tyr His Gly Glu 
540 545 550 



1804 



GAA GTG ATG CTG TGG GAC GAC TAT GGA ATG ACA AAG ATA CAG GAA GAC 
Glu Val Met Leu Trp Asp Asp Tyr Gly Met Thr Lys He Gin Glu Asp 
555 560 565 



1852 



TGT AAT AAA CTG CAA GCC ATA GCC GAC TCA GCC CCC CTA ACA CTC AAT 
Cys Asn Lys Leu Gin Ala He Ala Asp Ser Ala Pro Leu Thr Leu Asn 
570 575 580 585 



1900 



TGT GAC CGA ATA GAA AAC AAG GGA ATG CAA TTT GTG TCT GAT GCT ATA 
Cys Asp Arg He Glu Asn Lys Gly Met Gin Phe Val Ser Asp Ala He 
590 595 600 



1948 



GTC ATC ACC ACC AAT GCT CCT GGC CCA GCC CCA GTG GAC TTT GTC AAC 
Val He Thr Thr Asn Ala Pro Gly Pro Ala Pro Val Asp Phe Val Asn 
605 610 615 



1996 



CTC GGG CCT GTT TGC CGA AGG GTG GAC TTC CTT GTG TAT TGC ACG GCA 2044 
Leu Gly Pro Val Cys Arg Arg Val Asp Phe Leu Val Tyr Cys Thr Ala 
620 625 630 

CCT GAA GTT GAA CAC ACG AGG AAA GTC AGT CCT GGG GAC ACA ACT GCA 2092 
Pro Glu Val Glu His Thr Arg Lys Val Ser Pro Gly Asp Thr Thr Ala 
635 640 645 

CTG AAA GAC TGC TTC AAG CCC GAT TTC TCA CAT CTA AAA ATG GAG TTG 2140 
Leu Lys Asp Cys Phe Lys Pro Asp Phe Ser His Leu Lys Met Glu Leu 
650 655 660 665 

GCT CCC CAA GGG GGC TTT GAT AAC CAA GGG AAT ACC COG TTT GGT AAG 2188 
Ala Pro Gin Gly Gly Phe Asp Asn Gin Gly Asn Thr Pro Phe Gly Lys 
670 675 680 



GGT GTG ATG AAG CCC ACC ACC ATA AAC AGG CTG TTA ATC CAG GCT GTA 
Gly Val Met Lys Pro Thr Thr He Asn Arg Leu Leu He Gin Ala Val 
685 690 695 



2236 



GCC TTG ACG ATG GAG AGA CAG GAT GAG TTC CAA CTC CAG GGG CCT ACG 2284 

Ala Leu Thr Met Glu Arg Gin Asp Glu Phe Gin Leu Gin Gly Pro Thr 
700 705 710 

TAT GAC TTT GAT ACT GAC AGA GTA GCT GCG TTC ACG AGG ATG GCC CGA 2332 

Tyr Asp Phe Asp Thr Asp Arg Val Ala Ala Phe Thr Arg Met Ala Arg 
715 720 725 



GCC AAC GGG TTG GGT CTC ATA TCC ATG GCC TCC CTA GGC AAA AAG CTA 
Ala Asn Gly Leu Gly Leu He Ser Met Ala Ser Leu Gly Lys Lys Leu 
730 735 7.40 745 



2380 
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Table 3, continued 

CGC AGT GTC ACC ACT ATT GAA GGA TTA AAG AAT GCT CTA TCA GGC TAT 2428 
Arg Ser Val Thr Thr lie Glu Gly Leu Lys Asn Ala Leu Ser Gly Tyr 
750 755 760 

AAA ATA TCA AAA TGC AGT ATA CAA TGG CAG TCA AGG GTG TAC ATT ATA 2476 
Lys lie Ser Lys Cys Ser lie Gin Trp Gin Ser Arg Val Tyr lie He 
765 770 775 

GAA TCA GAT GGT GCC AGT GTA CAA ATC AAA GAA GAC AAG CAA GCT TTG 2524 
Glu Ser Asp Gly Ala Ser Val Gin He Lys Glu Asp Lys Gin Ala Leu 
780 785 790 

ACC CCT CTG CAG CAG ACA ATT AAC ACG GCC TCA CTT GCC ATC ACT CGA 2572 
Thr Pro Leu Gin Gin Thr He Asn Thr Ala Ser Leu Ala He Thr Arg 
795 800 805 

CTC AAA GCA GCT AGG GCT GTG GCA TAC GCT TCA TGT TTC CAG TCC GCC 2620 
Leu Lys Ala Ala Arg Ala Val Ala Tyr Ala Ser Cys Phe Gin Ser Ala 
810 815 820 825 

ATA ACT ACC ATA CTA CAA ATG GCG GGA TCT GCG CTC GTT ATT AAT CGA 2668 
He Thr Thr He Leu Gin Met Ala Gly Ser Ala Leu Val He ABn Arg 
830 835 840 

GCG GTC AAG CGT ATG TTT GGT ACC CGT ACA GCA GCC ATG GCA TTA GAA 2716 
Ala Val Lys Arg Met Phe Gly Thr Arg Thr Ala Ala Met Ala Leu Glu 
845 850 855 

GGA CCT GGG AAA GAA CAT AAT TGC AGG GTC CAT AAG GCT AAG GAA GCT 2764 
Gly Pro Gly Lys Glu HiB Asn Cys Arg Val His Lys Ala Lys Glu Ala 
860 865 870 

GGA AAG GGG CCC ATA GGT CAT GAT GAC ATG GTA GAA AGG TTT GGC CTA 2812 
Gly Lys Gly Pro He Gly His Asp Asp Met Val Glu Arg Phe Gly Leu 
875 880 885 

TGT GAA ACT GAA GAG GAG GAG AGT GAG GAC CAA ATT CAA ATG GTA CCA 2860 
Cys Glu Thr Glu Glu Glu Glu Ser Glu Asp Gin He Gin Met Val Pro 
890 895 900 905 

AGT GAT GCC GTC CCA GAA GGA AAG AAC AAA GGC AAG ACC AAA AAG GGA 2908 
Ser Asp Ala Val Pro Glu Gly Lys Asn Lys Gly Lys Thr Lys Lys Gly 
910 915 920 

CGT GGT CGC AAA AAT AAC TAT AAT GCA TTC TCT CGC CGT GGT CTG AGT 2956 
Arg Gly Arg Lys Asn Asn Tyr Asn Ala Phe Ser Arg Arg Gly Leu Ser 
925 930 935 

GAT GAA GAA TAT GAA GAG TAC AAA AAG ATC AGA GAA GAA AAG AAT GGC 3004 
Asp Glu Glu Tyr Glu Glu Tyr Lys Lys He Arg Glu Glu Lys Asn Gly 
940 945 950 

AAT TAT AGT ATA CAA GAA TAC TTG GAG GAC CGC CAA CGA TAT GAG GAA 3052 
Asn Tyr Ser He Gin Glu Tyr Leu Glu Asp Arg Gin Arg Tyr Glu Glu 
955 960 965 

GAA TTA GCA GAG GTA CAG GCA GGT GGT GAT GGT GGC ATA GGA GAA ACT 3100 
Glu Leu Ala Glu Val Gin Ala Gly Gly Asp Gly Gly He Gly Glu Thr 
970 975 980 985 

GAA ATG GAA ATC CGT CAC AGG GTC TTC TAT AAA TCC AAG AGT AAG AAA 3148 
Glu Met Glu He Arg His Arg Val Phe Tyr Lys Ser Lys Ser Lys Lys 
990 995 1000 
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Table 3, continued 

CAC CAA CAA GAG CAA CGG CGA CAA CTT GGT CTA GTG ACT GGA TCA GAC 3196 

His Gin Gin Glu Gin Arg Arg Gin Leu Gly Leu Val Thr Gly Ser Asp 
1005 1010 1015 

ATC AGA AAA CGT AAG CCC ATT GAC TGG ACC CCG CCA AAG AAT GAA TGG 3244 
lie Arg Lys Arg Lys Pro lie Asp Trp Thr Pro Pro Lys Asn Glu Trp 
1020 1025 1030 

GCA GAT GAT GAC AGA GAG GTG GAT TAT AAT GAA AAG ATC AAT TTT GAA 3292 
Ala Asp Asp Asp Arg Glu Val Asp Tyr Asn Glu Lys lie Asn Phe Glu 
1035 1040 1045 

GCT CCC CCG ACA CTA TGG AGC CGA GTC ACA AAG TTT GGA TCA GGA TGG 3340 
Ala Pro Pro Thr Leu Trp Ser Arg Val Thr Lys Phe Gly Ser Gly Trp 
1050 1055 1060 1065 

GGC TTT TGG GTC AGC CCG ACA GTG TTC ATC ACA ACC ACA CAT GTA GTG 3388 
Gly Phe Trp Val Ser Pro Thr Val Phe lie Thr Thr Thr His Val Val 
1070 1075 1080 

CCA ACT GGT GTG AAA GAA TTC TTT GGT GAG CCC CTA TCT AGT ATA GCA 3436 
Pro Thr Gly Val Lys Glu Phe Phe Gly Glu Pro Leu Ser Ser lie Ala 
1085 1090 1095 

ATC CAC CAA GCA GGT GAG TTC ACA CAA TTC AGG TTC TCA AAG AAA ATG 3484 
lie His Gin Ala Gly Glu Phe Thr Gin Phe Arg Phe Ser Lys Lys Met 
1100 1105 1110 

CGC CCT GAC TTG ACA GGT ATG GTC CTT GAA GAA GGT TGC CCT GAA GGG 3532 
Arg Pro Asp Leu Thr Gly Met Val Leu Glu Glu Gly Cys Pro Glu Gly 
1115 1120 1125 

ACA GTC TGC TCA GTC CTA ATT AAA CGG GAT TCG GGT GAA CTA CTT CCG 3580 
Thr Val Cys Ser Val Leu lie Lys Arg Asp Ser Gly Glu Leu Leu Pro 
1130 1135 1140 1145 

CTA GCC GTC CGT ATG GGG GCT ATT GCC TCC ATG AGG ATA CAG GGT CGG 3628 
Leu Ala Val Arg Met Gly Ala He Ala Ser Met Arg He Gin Gly Arg 
1150 1155 1160 

CTT GTC CAT GGC CAA TCA GGG ATG TTA CTG ACA GGG GCC AAT GCA AAG 3676 
Leu Val His Gly Gin Ser Gly Met Leu Leu Thr Gly Ala Asn Ala Lys 
1165 1170 1175 

GGG ATG GAT CTT GGC ACT ATA CCA GGA GAC TGC GGG GCA CCA TAC GTC 3724 
Gly Met Asp Leu Gly Thr He Pro Gly Asp Cys Gly Ala Pro Tyr Val 
1180 1185 1190 

CAC AAG CGC GGG AAT GAC TGG GTT GTG TGT GGA GTC CAC GCT GCA GCC 3772 
HiB Lys Arg Gly Asn Asp Trp Val Val Cys Gly Val HiB Ala Ala Ala 
1195 1200 1205 

ACA AAG TCA GGC AAC ACC GTG GTC TGC GCT GTA CAG GCT GGA GAG GGC 3820 
Thr Lys Ser Gly Asn Thr Val Val Cys Ala Val Gin Ala Gly Glu Gly 
1210 1215 1220 1225 

GAA ACC GCA CTA GAA GGT GGA GAC AAG GGG CAT TAT GCC GGC CAC GAG 3868 
Glu Thr Ala Leu Glu Gly Gly Asp Lys Gly His Tyr Ala Gly His Glu 
1230 1235 1240 

ATT GTG AGG TAT GGA AGT GGC CCA GCA CTG TCA ACT AAA ACA AAA TTC 3916 
He Val Arg Tyr Gly Ser Gly Pro Ala Leu Ser Thr Lys Thr Lys Phe 
1245 1250 1255 
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Table 3, continued 

TGG AGG TCC TCC CCA GAA CCA CTG CCC CCC GGA GTA TAT GAG CCA GCA 3964 
Trp Arg Ser Ser Pro Glu Pro Leu Pro Pro Gly Val Tyr Glu Pro Ala 
1260 1265 1270 

TAG CTG GGG GGC AAG GAC CCC CGT GTA CAG AAT GGC CCA TCC CTA CAA 4012 
Tyr Leu Gly Gly Lys Asp Pro Arg Val Gin Asn Gly Pro Ser Leu Gin 
1275 1280 1285 

CAG GTA CTA CGT GAC CAA CTG AAA CCC TTT GCG GAC CCC CGC GGC CGC 4060 
Gin Val Leu Arg Asp Gin Leu LyB Pro Phe Ala Asp Pro Arg Gly Arg 
1290 1295 1300 1305 

ATG CCT GAG CCT GGC CTA CTG GAG GCT GCG GTT GAG ACT GTA ACA TCC 4108 
Met Pro Glu Pro Gly Leu Leu Glu Ala Ala Val Glu Thr Val Thr Ser 
1310 1315 1320 

ATG TTA GAA CAG ACA ATG GAT ACC CCA AGC CCG TGG TCT TAC GCT GAT 4156 
Met Leu Glu Gin Thr Met Asp Thr Pro Ser Pro Trp Ser Tyr Ala Asp 
1325 1330 1335 

GCC TGC CAA TCT CTT GAC AAA ACT ACT AGT TCG GGG TAC CCT CAC CAT 4204 
Ala Cys Gin Ser Leu Asp Lys Thr Thr Ser Ser Gly Tyr Pro His His 
1340 1345 1350 

AAA AGG AAG AAT GAT GAT TGG AAT GGC ACC ACC TTC GTT GGA GAG CTC 4252 
Lys Arg Lys Asn Asp Asp Trp Asn Gly Thr Thr Phe Val Gly Glu Leu 
1355 1360 1365 

GGT GAG CAA GCT GCA CAC GCC AAC AAT ATG TAT GAG AAT GCT AAA CAT 4300 
Gly Glu Gin Ala Ala His Ala Asn Asn Met Tyr Glu Asn Ala Lys His 
1370 1375 1380 1385 

ATG AAA CCC ATT TAC ACT GCA GCC TTA AAA GAT GAA CTA GTC AAG CCA 4348 
Met LyB Pro lie Tyr Thr Ala Ala Leu Lys Asp Glu Leu Val Lys Pro 
1390 1395 1400 

GAA AAG ATT TAT CAA AAA GTC AAG AAG CGT CTA CTA TGG GGC GCC GAT 4396 
Glu Lys He Tyr Gin Lys Val Lys Lys Arg Leu Leu Trp Gly Ala Asp 
1405 1410 1415 

CTC GGA ACA GTG GTC AGG GCC GCC CGG GCT TTT GGC CCA TTT TGT GAC 4444 
Leu Gly Thr Val Val Arg Ala Ala Arg Ala Phe Gly Pro Phe Cys Asp 
1420 1425 1430 

GCT ATA AAA TCA CAT GTC ATC AAA TTG CCA ATA AAA GTT GGC ATG AAC 4492 
Ala He Lys Ser His Val He Lys Leu Pro He Lys Val Gly Met Asn 
1435 1440 1445 

ACA ATA GAA GAT GGC CCC CTC ATC TAT GCT GAG CAT GCT AAA TAT AAG 4540 
Thr He Glu Asp Gly Pro Leu He Tyr Ala Glu His Ala Lys Tyr Lys 
1450 1455 1460 1465 

AAT CAT TTT GAT GCA GAT TAT ACA GCA TGG GAC TCA ACA CAA AAT AGA 4588 
Asn His Phe Asp Ala Asp Tyr Thr Ala Trp Asp Ser Thr Gin Asn Arg 
1470 1475 1480 

CAA ATT ATG ACA GAA TCC TTC TCC ATT ATG TCG CGC CTT ACG GCC TCA 4636 
Gin He Met Thr Glu Ser Phe Ser He Met Ser Arg Leu Thr Ala Ser 
1485 1490 1495 

CCA GAA TTG GCC GAG GTT GTG GCC CAA GAT TTG CTA GCA CCA TCT GAG 4684 
Pro Glu Leu Ala Glu Val Val Ala Gin Asp Leu Leu Ala Pro Ser Glu 
1500 1505 1510 
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Table 3, continued 



ATG GAT GTA GGT GAT TAT GTC ATC AGG GTC AAA GAG GGG CTG CCA TCT 
Met Asp Val Gly Asp Tyr Val He Arg Val Lys Glu Gly Leu Pro Ser 
1515 1520 1525 



4732 



GGA TTC CCA TGT ACT TCC CAG GTG AAC AGC ATA AAT CAC TGG ATA ATT 
Gly Phe Pro Cys Thr Ser Gin Val Asn Ser He Asn His Trp He He 
1530 1535 1540 1545 



4780 



ACT CTC TGT GCA CTG TCT GAG GCC ACT GGT TTA TCA CCT GAT GTG GTG 
Thr Leu Cys Ala Leu Ser Glu Ala Thr Gly Leu Ser Pro Asp Val Val 
1550 1555 1560 



4828 



CAA TCC ATG TCA TAT TTC TCA TTT TAT GGT GAT GAT GAG ATT GTG TCA 
Gin Ser Met Ser Tyr Phe Ser Phe Tyr Gly Asp Asp Glu He Val Ser 
1565 1570 1575 



4876 



ACT GAC ATA GAT TTT GAC CCA GCC CGC CTC ACT CAA ATT CTC AAG GAA 
Thr Asp He Asp Phe Asp Pro Ala Arg Leu Thr Gin He Leu Lys Glu 
1580 1585 1590 



4924 



TAT GGC CTC AAA CCA ACA AGG CCT GAC AAA ACA GAA GGA CCA ATA CAA 
Tyr Gly Leu Lys Pro Thr Arg Pro Asp Lys Thr Glu Gly Pro He Gin 
1595 1600 1605 



4972 



GTG AGG AAA AAT GTG GAT GGA CTG GTC TTC TTG CGG CGC ACC ATT TCC 5020 
Val Arg Lys Asn Val Asp Gly Leu Val Phe Leu Arg Arg Thr He Ser 
1610 1615 1620 1625 

CGT GAT GCG GCA GGG TTC CAA GGC AGG TTA GAT AGG GCT TCG ATT GAA 5068 
Arg Asp Ala Ala Gly Phe Gin Gly Arg Leu Asp Arg Ala Ser He Glu 
1630 1635 1640 



CGC CAA ATC TTC TGG ACC CGC GGG CCC AAT CAT TCA GAT CCA TCA GAG 
Arg Gin He Phe Trp Thr Arg Gly Pro Asn His Ser Asp Pro Ser Glu 
1645 1650 1655 



5116 



ACT CTA GTG CCA CAC ACT CAA AGA AAA ATA CAG TTG ATT TCA CTT CTA 5164 
Thr Leu Val Pro His Thr Gin Arg Lys He Gin Leu He Ser Leu Leu 
1660 1665 1670 

GGG GAA GCT TCA CTC CAT GGT GAG AAA TTT TAC AGA AAG ATT TCC AGC 5212 
Gly Glu Ala Ser Leu His Gly Glu Lys Phe Tyr Arg Lys He Ser Ser 
1675 1680 1685 

AAG GTC ATA CAT GAA ATC AAG ACT GGT GGA TTG GAA ATG TAT GTC CCA 5260 
Lys Val He His Glu He Lys Thr Gly Gly Leu Glu Met Tyr Val Pro 
1690 1695 1700 1705 

GGA TGG CAG GCC ATG TTC CGC TGG ATG CGC TTC CAT GAC CTC GGA TTG 5308 
Gly Trp Gin Ala Met Phe Arg Trp Met Arg Phe His Asp Leu Gly Leu 
1710 1715 1720 

TGG ACA GGA GAT CGC GAT CTT CTG CCC GAA TTC GTA AAT GAT GAT GGC 5356 
Trp Thr Gly Asp Arg Asp Leu Leu Pro Glu Phe Val Asn Asp Asp Gly 
1725 1730 1735 

GTC TAAGGACGCT ACATCAAGCG TGGATGGCGC TAGTGGCGCT GGTCAGTTGG 5409 
Val 
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Table 4. The amino acid sequence deduced from nucleotides 5346 through 
6935 of the Norwalk virus genome shown in Table 2. 

CGTAA ATG ATG ATG GCG TCT AAG GAC GCT ACA TCA AGC GTG GAT GGC 5387 
Met Met Met Ala Ser Lys ABp Ala Thr Ser Ser Val Asp Gly 
15 10 

GCT AGT GGC GCT GGT CAG TTG GTA CCG GAG GTT AAT GCT TCT GAC CCT 5435 
Ala Ser Gly Ala Gly Gin Leu Val Pro Glu Val ABn Ala Ser Asp Pro 
15 20 25 30 

CTT GCA ATG GAT CCT GTA GCA GGT TCT TCG ACA GCA GTC GCG ACT GCT 5483 
Leu Ala Met Asp Pro Val Ala Gly Ser Ser Thr Ala Val Ala Thr Ala 
35 40 45 

GGA CAA GTT AAT CCT ATT GAT CCC TGG ATA ATT AAT AAT TTT GTG CAA 5531 
Gly Gin Val Asn Pro lie Asp Pro Trp lie lie Asn Asn Phe Val Gin 
50 55 60 

GCC CCC CAA GGT GAA TTT ACT ATT TCC CCA AAT AAT ACC CCC GGT GAT 5579 
Ala Pro Gin Gly Glu Phe Thr lie Ser Pro Asn Asn Thr Pro Gly Asp 
65 70 75 

GTT TTG TTT GAT TTG AGT TTG GGT CCC CAT CTT AAT CCT TTC TTG CTC 5627 
Val Leu Phe Asp Leu Ser Leu Gly Pro His Leu Asn Pro Phe Leu Leu 
80 85 90 

CAT CTA TCA CAA ATG TAT AAT GGT TGG GTT GGT AAC ATG AGA GTC AGG 5675 
His Leu Ser Gin Met Tyr Asn Gly Trp Val Gly Asn Met Arg Val Arg 
95 100 105 110 

ATT ATG CTA GCT GGT AAT GCC TTT ACT GCG GGG AAG ATA ATA GTT TCC 5723 
lie Met Leu Ala Gly Asn Ala Phe Thr Ala Gly Lys lie lie Val Ser 
115 120 125 

TGC ATA CCC CCT GGT TTT GGT TCA CAT AAT CTT ACT ATA GCA CAA GCA 5771 
Cys He Pro Pro Gly Phe Gly Ser His Asn Leu Thr He Ala Gin Ala 
130 135 140 

ACT CTC TTT CCA CAT GTG ATT GCT GAT GTT AGG ACT CTA GAC CCC ATT 5819 
Thr Leu Phe Pro His Val He Ala Asp Val Arg Thr Leu Asp Pro He 
145 150 155 

GAG GTG CCT TTG GAA GAT GTT AGG AAT GTT CTC TTT CAT AAT AAT GAT 5867 
Glu Val Pro Leu Glu Asp Val Arg Asn Val Leu Phe His Asn Asn Asp 
160 165 170 

AGA AAT CAA CAA ACC ATG CGC CTT GTG TGC ATG CTG TAC ACC CCC CTC 5915 
Arg Asn Gin Gin Thr Met Arg Leu Val Cys Met Leu Tyr Thr Pro Leu 
175 180 185 190 

CGC ACT GGT GGT GGT ACT GGT GAT TCT TTT GTA GTT GCA GGG CGA GTT 5963 
Arg Thr Gly Gly Gly Thr Gly Asp Ser Phe Val Val Ala Gly Arg Val 
195 200 205 

ATG ACT TGC CCC AGT CCT GAT TTT AAT TTC TTG TTT TTA GTC CCT CCT 6011 
Met Thr Cys Pro Ser Pro Asp Phe Asn Phe Leu Phe Leu Val Pro Pro 
210 215 220 

ACG GTG GAG CAG AAA ACC AGG CCC TTC ACA CTC CCA AAT CTG CCA TTG 6059 
Thr Val Glu Gin Lys Thr Arg Pro Phe Thr Leu Pro Asn Leu Pro Leu 
225 230 235 
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Table 4, continued 

AGT TCT CTG TCT AAC TCA CGT GCC CCT CTC CCA ATC AGT AGT ATG GGC 6107 

Ser Ser Leu Ser Asn Ser Arg Ala Pro Leu Pro lie Ser Ser Met Gly 
240 245 250 

ATT TCC CCA GAC AAT GTC CAG AGT GTG CAG TTC CAA AAT GGT CGG TGT 6155 
lie Ser Pro Asp Asn Val Gin Ser Val Gin Phe Gin Asn Gly Arg Cys 
255 260 265 270 

ACT CTG GAT GGC CGC CTG GTT GGC ACC ACC CCA GTT TCA TTG TCA CAT 6203 
Thr Leu Asp Gly Arg Leu Val Gly Thr Thr Pro Val Ser Leu Ser His 
275 280 285 

GTT GCC AAG ATA AG A GGG ACC TCC AAT GGC ACT GTA ATC AAC CTT ACT 6251 
Val Ala Lys lie Arg Gly Thr Ser ABn Gly Thr Val lie Asn Leu Thr 
290 295 300 

GAA TTG GAT GGC ACA CCC TTT CAC CCT TTT GAG GGC CCT GCC CCC ATT 6299 
Glu Leu Asp Gly Thr Pro Phe His Pro Phe Glu Gly Pro Ala Pro lie 
305 310 315 

GGG TTT CCA GAC CTC GGT GGT TGT GAT TGG CAT ATC AAT ATG ACA CAG 6347 
Gly Phe Pro Asp Leu Gly Gly Cys Asp Trp His lie Asn Met Thr Gin 
320 325 330 

TTT GGC CAT TCT AGC CAG ACC CAG TAT GAT GTA GAC ACC ACC CCT GAC 6395 
Phe Gly His Ser Ser Gin Thr Gin Tyr Asp Val Asp Thr Thr Pro Asp 
335 340 345 350 

ACT TTT GTC CCC CAT CTT GGT TCA ATT CAG GCA AAT GGC ATT GGC AGT 6443 
Thr Phe Val Pro His Leu Gly Ser lie Gin Ala Asn Gly lie Gly Ser 
355 360 365 

GGT AAT TAT GTT GGT GTT CTT AGC TGG ATT TCC CCC CCA TCA CAC CCG 6491 
Gly Asn Tyr Val Gly Val Leu Ser Trp lie Ser Pro Pro Ser His Pro 
370 375 380 

TCT GGC TCC CAA GTT GAC CTT TGG AAG ATC CCC AAT TAT GGG TCA AGT 6539 
Ser Gly Ser Gin Val Asp Leu Trp Lys lie Pro Asn Tyr Gly Ser Ser 
385 390 395 

ATT ACG GAG GCA ACA CAT CTA GCC CCT TCT GTA TAC CCC CCT GGT TTC 6587 
He Thr Glu Ala Thr His Leu Ala Pro Ser Val Tyr Pro Pro Gly Phe 
400 405 410 

GGA GAG GTA TTG GTC TTT TTC ATG TCA AAA ATG CCA GGT CCT GGT GCT 6635 
Gly Glu Val Leu Val Phe Phe Met Ser Lys Met Pro Gly Pro Gly Ala 
415 420 425 430 

TAT AAT TTG CCC TGT CTA TTA CCA CAA GAG TAC ATT TCA CAT CTT GCT 6683 
Tyr Asn Leu Pro Cys Leu Leu Pro Gin Glu Tyr He Ser His Leu Ala 
435 440 445 

AGT GAA CAA GCC CCT ACT GTA GGT GAG GCT GCC CTG CTC CAC TAT GTT 6731 
Ser Glu Gin Ala Pro Thr Val Gly Glu Ala Ala Leu Leu His Tyr Val 
450 455 460 

GAC CCT GAT ACC GGT CGG AAT CTT GGG GAA TTC AAA GCA TAC CCT GAT 6779 
Asp Pro Asp Thr Gly Arg Asn Leu Gly Glu Phe Lys Ala Tyr Pro Asp 
465 470 475 

GGT TTC CTC ACT TGT GTC CCC AAT GGG GCT AGC TCG GGT CCA CAA CAG 6827 
Gly Phe Leu Thr Cys Val Pro Asn Gly Ala Ser Ser Gly Pro Gin Gin 
480 485 490 
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Table 4, continued 

CTG CCG ATC AAT GGG GTC TTT GTC TTT GTT TCA TGG GTG TCC AGA TTT 6875 

Leu Pro lie ABn Gly Val Phe Val Phe Val Ser Trp Val Ser Arg Phe 

495 500 505 510 

TAT CAA TTA AAG CCT GTG GGA ACT GCC AGC TCG GCA AGA GGT AGG CTT 6923 
Tyr Gin Leu Lys Pro Val Gly Thr Ala Ser Ser Ala Arg Gly Arg Leu 
515 520 525 

GGT CTG CGC CGA TAATGGCCCA AGCCATAATT GGTGCAATTG CTGCTTCCAC 6975 
Gly Leu Arg Arg 
530 
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Table 5. The amino acid sequence deduced from nucleotides 6938 through 
7573 of the Norwalk viruB genome shown in Table 2. 

CCAGCTCGGC AAGAGGTAGG CTTGGTCTGC GCCGATA ATG GCC CAA GCC ATA ATT 6955 

Met Ala Gin Ala He He 
1 5 

GGT GCA ATT GCT GCT TCC ACA GCA GGT AGT GOT CTG GGA GCG GGC ATA 7003 
Gly Ala He Ala Ala Ser Thr Ala Gly Ser Ala Leu Gly Ala Gly He 
10 15 20 

CAG GTT GGT GGC GAA GCG GCC CTC CAA AGC CAA AGG TAT CAA CAA AAT 7051 
Gin Val Gly Gly Glu Ala Ala Leu Gin Ser Gin Arg Tyr Gin Gin Asn 
25 30 35 

TTG CAA CTG CAA GAA AAT TCT TTT AAA CAT GAC AGG GAA ATG ATT GGG 7099 
Leu Gin Leu Gin Glu ABn Ser Phe LyB His Asp Arg Glu Met He Gly 
40 45 50 

TAT CAG GTT GAA GCT TCA AAT CAA TTA TTG GCT AAA AAT TTG GCA ACT 7147 
Tyr Gin Val Glu Ala Ser Asn Gin Leu Leu Ala Lys Asn Leu Ala Thr 
55 60 65 70 

AGA TAT TCA CTC CTC CGT GCT GGG GGT TTG ACC AGT GCT GAT GCA GCA 7195 
Arg Tyr Ser Leu Leu Arg Ala Gly Gly Leu Thr Ser Ala Asp Ala Ala 
75 80 85 

AGA TCT GTG GCA GGA GCT CCA GTC ACC CGC ATT GTA GAT TGG AAT GGC 7243 
Arg Ser Val Ala Gly Ala Pro Val Thr Arg He Val Asp Trp Asn Gly 
90 95 100 

GTG AGA GTG TCT GCT CCC GAG TCC TCT GCT ACC ACA TTG AGA TCC GGT 7291 
Val Arg Val Ser Ala Pro Glu Ser Ser Ala Thr Thr Leu Arg Ser Gly 
105 110 us 

GGC TTC ATG TCA GTT CCC ATA CCA TTT GCC TCT AAG CAA AAA CAG GTT 7339 
Gly Phe Met Ser Val Pro He Pro Phe Ala Ser Lys Gin Lys Gin Val 
120 125 130 

CAA TCA TCT GGT ATT AGT AAT CCA AAT TAT TCC CCT TCA TCC ATT TCT 7387 
Gin Ser Ser Gly He Ser Asn Pro Asn Tyr Ser Pro Ser Ser He Ser 
135 140 145 150 

CGA ACC ACT AGT TGG GTC GAG TCA CAA AAC TCA TCG AGA TTT GGA AAT 7435 
Arg Thr Thr Ser Trp Val Glu Ser Gin Asn Ser Ser Arg Phe Gly Asn 
155 160 165 

CTT TCT CCA TAC CAC GCG GAG GCT CTC AAT ACA GTG TGG TTG ACT CCA 7483 
Leu Ser Pro Tyr His Ala Glu Ala Leu Asn Thr Val Trp Leu Thr Pro 
170 175 180 

CCC GGT TCA ACA GCC TCT TCT ACA CTG TCT TCT GTG CCA CGT GGT TAT 7531 
Pro Gly Ser Thr Ala Ser Ser Thr Leu Ser Ser Val Pro Arq Gly Tvr 
185 190 195 

TTC AAT ACA GAC AGG TTG CCA TTA TTC GCA AAT AAT AGG CGA 7573 
Phe ABn Thr Asp Arg Leu Pro Leu Phe Ala Asn Asn Arg Arg 
200 205 210 
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Table 6. Primers used for detection of Norwalk-related 
virus by PCR 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Matson, David 0 
Eetes, Mary K 
Jiang, Xi 
Graham, David Y 

(ii) TITLE OF INVENTION: Methods and Reagents to Detect and 
Characterize Norwalk and Related Viruses 

(iii) NUMBER OF SEQUENCES: 75 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fulbright & Jaworski Patent Dept 

(B) STREET: 1301 McKinney, Suite 5100 
<C) CITY: Houston 

(D) STATE: Texas 

(E) COUNTRY: USA 

(F) ZIP: 77010-3095 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

<D) SOFTWARE: Patentln Release #1.0, Version #1.25 

<vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Launer, Charlene A 

(B) REGISTRATION NUMBER: 33,035 

(C) REFERENCE/DOCKET NUMBER: D-5526 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 713-651-3634 
<B) TELEFAX: 713-651-5246 
(C) TELEX: Western Union 762829 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7753 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Norwalk virus 

(B) STRAIN: 8FIIa 

(C) INDIVIDUAL ISOLATE: 8FIIa 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: pUCNV-953 and its derivatives 



(ix) FEATURE: 
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(A) NAME/KEY: CDS 

(B) LOCATION: 146.. 5359 

<D) OTHER INFORMATION: /note= "The protein encoded by 

nucleotides 146 through 5359 is eventually cleaved 
to make at least a picornavirus 2c-Like protein, a 
3C-like protease and an RNA-dependent RNA plymerase. 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION* 5346.. 6935 

(D) OTHER INFORMATION: /note= "Nucleotides 5346 through 
5359 are used for coding two different amino acid 
sequences: the first is the amino acid is coded by 
nucleotide 146 through 5359, the second by nucleotides 
5346 through 6935. 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 6938.. 7573 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



GGCGTCAAAA GACGTCGTTC CTACTGCTGC TAGCAGTGAA AATGCTAACA ACAATAGTAG 60 

TATTAAGTCT CGTCTATTGG CGAGACTCAA GGGTTCAGGT GGGGCTACGT CCCCACCCAA 120 

CTCGATAAAG ATAACCAACC AAGATATGGC TCTGGGGCTG ATT GG AC AGG TCCCAGCGCC 180 

AAAGGCCACA TCCGTCGATG TCCCTAAACA ACAGAGGGAT AGACCACCAC GGACTGTTGC 240 

CGAAGTTCAA CAAAATTTGC GTTGGACTGA GAGACCACAA GACCAGAATG TTAAGACGTG 300 

GGATGAGCTT GACCACACAA CAAAACAACA GATACTTGAT GAACACGCTG AGTGGTTTGA 360 

TGCCGGTGGC TTAGGTCCAA GTACACTACC CACTAGTCAT GAACGGTACA CACATGAGAA 420 

TGATGAAGGC CACCAGGTAA AGTGGTCGGC TAGGGAAGGT GTAGACCTTG GCATATCCGG 480 

GCTCACGACG GTGTCTGGGC CTGAGTGGAA TATGTGCCCG CTACCACCAG TTGACCAAAG 540 

GAGCACGACA CCTGCAACTG AGCCCACAAT TGGTGACATG ATCGAATTCT ATGAAGGGCA 600 

CATCTATCAT TATGCTATAT ACATAGGTCA AGGCAAGACG GTGGGTGTAC ACTCCCCTCA 660 

AGCAGCCTTC TCA A TffRCGA- GGATCACCAT - ACAGCCCATA TCAGCTTGGT" GGCGAGTCTG 720 

TTATGTCCCA CAACCAAAAC AGAGGCTCAC ATACGACCAA CTCAAAGAAT TAGAAAATGA 780 

ACCATGGCCG TATGCCGCAG TCACGAACAA CTGCTTCGAA TTTTGTTGCC AGGTCATGTG 840 

CTTGGAAGAT ACTTGGTTGC AAAGGAAGCT CATCTCCTCT GGCCGGTTTT ACCACCCGAC 900 

CCAAGATTGG TCCCGAGACA CTCCAGAATT CCAACAAGAC AGCAAGTTAG AGATGGTTAG 960 

GGATGCAGTG CTAGCCGCTA TAAATGGGTT GGTGTCGCGG CCATTTAAAG ATCTTCTGGG 1020 

TAAGCTCAAA CCCTTGAACG TGCTTAACTT ACTTTCAAAC TGTGATTGGA CGTTCATGGG 1080 

GGTCGTGGAG ATGGTGGTCC TCCTTTTAGA ACTCTTTGGA ATCTTTTGGA ACCCACCTGA 1140 

TGTTTCCAAC TTTATAGCTT CACTCCTGCC AGATTTCCAT CTACAGGGCC CCGAGGACCT 1200 

TGCCAGGGAT CTCGTGCCAA TAGTATTGGG GGGGATCGGC TTAGCCATAG GATTCACCAG 1260 

AGACAAGGTA AGTAAGATGA TGAAGAATGC TGTTGATGGA CTTCGTGCGG CAACCCAGCT 1320 

CGGTCAATAT GGCCTAGAAA TATTCTCATT ACTAAAGAAG TACTTCTTCG GTGGTGATCA 1380 

AACAGAGAAA ACCCTAAAAG ATATTGAGTC AGCAGTTATA GATATGGAAG TACTATCATC 1440 

TACATCAGTG ACTCAGCTCG TGAGGGACAA ACAGTCTGCA CGGGCTTATA TGGCCATCTT 1500 

AGATAATGAA GAAGAAAAGG CAAGGAAATT ATCTGTCAGG AATGCCGACC CACACGTAGT 1560 

ATCCTCTACC AATGCTCTCA TATCCCGGAT CTCAATGGCT AGGGCTGCAT TGGCCAAGGC 1620 

TCAAGCTGAA ATGACCAGCA GGATGCGTCC TGTGGTCATT ATGATGTGTG GGCCCCCTGG 1680 

TATAGGTAAA ACCAAGGCAG CAGAACATCT GGCTAAACGC CTAGCCAATG AGATACGGCC 1740 

TGGTGGTAAG GTTGGGCTGG TCCCACGGGA GGCAGTGGAT CATTGGGATG GATATCACGG 1800 

AGAGGAAGTG ATGCTGTGGG ACGACTATGG AATGACAAAG ATACAGGAAG ACTGTAATAA 1860 

ACTGCAAGCC ATAGCCGACT CAGCCCCCCT AACACTCAAT TGTGACCGAA TAGAAAACAA 1920 

GGGAATGCAA TTTGTGTCTG ATGCTATAGT CATCACCACC AATGCTCCTG GCCCAGCCCC 1980 

AGTGGACTTT GTCAACCTCG GGCCTGTTTG CCGAAGGGTG GACTTCCTTG TGTATTGCAC 2040 

GGCACCTGAA GTTGAACACA CGAGGAAAGT CAGTCCTGGG GACACAACTG CACTGAAAGA 2100 

CTGCTTCAAG CCCGATTTCT CACATCTAAA AATGGAGTTG GCTCCCCAAG GGGGCTTTGA 2160 

TAACOKAGGG AATACCCCGT~TTGGTAAGGG" TGTGATGAAG CCCACCACCA " TAAACAGGCT 2220 

GTTAATCCAG GCTGTAGCCT TGACGATGGA GAGACAGGAT GAGTTCCAAC TCCAGGGGCC 2280 

TACGTATGAC TTTGATACTG ACAGAGTAGC TGCGTTCACG AGGATGGCCC GAGCCAACGG 2340 

GTTGGGTCTC ATATCCATGG CCTCCCTAGG CAAAAAGCTA CGCAGTGTCA CCACTATTGA 2400 

AGGATTAAAG AATGCTCTAT CAGGCTATAA AATATCAAAA TGCAGTATAC AATGGCAGTC 2460 

AAGGGTGTAC ATTATAGAAT CAGATGGTGC CAGTGTACAA ATCAAAGAAG ACAAGCAAGC 2520 

TTTGACCCCT CTGCAGCAGA CAATTAACAC GGCCTCACTT GCCATCACTC GACTCAAAGC 2580 
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AGCTAGGGCT GTGGCATACG CTTCATGTTT CCAGTCCGCC ATAACTACCA TACTACAAAT 2640 

GGCGGGATCT GCGCTCGTTA TTAATCGAGC GGTCAAGCGT ATGTTTGGTA CCCGTACAGC 2700 

AGCCATGGCA TTAGAAGGAC CTGGGAAAGA ACATAATTGC AGGGTCCATA AGGCTAAGGA 2760 

AGCTGGAAAG GGGCCCATAG GTCATGATGA CATGGTAGAA AGGTTTGGCC TATGTGAAAC 2820 

TGAAGAGGAG GAGAGTGAGG ACCAAATTCA AATGGTACCA AGTGATGCCG TCCCAGAAGG 2880 

AAAGAAGAAA GGCAAGACCA AAAAGGGACG TGGTCGCAAA AATAACTATA ATGCATTCTC 2940 

TCGCCGTGGT CTGAGTGATG AAGAATATGA AGAGTACAAA AAGATCAGAG AAGAAAAGAA 3000 

TGGCAATTAT AGTATACAAG AATACTTGGA GGACCGCCAA CGATATGAGG AAGAATTAGC 3060 

AGAGGTACAG GCAGGTGGTG ATGGTGGGAT AGGAGAAACT GAAATGGAAA TCCGTCACAG 3120 

GGTCTTCTAT AAATCCAAGA GTAAGAAACA CCAACAAGAG CAACGGCGAC AACTTGGTCT 3180 

AGTGACTGGA TCAGACATCA GAAAACGTAA GCCCATTGAC TGGACCCCGC CAAAGAATGA 3240 

ATGGGCAGAT GATGACAGAG AGGTGGATTA TAATGAAAAG ATCAATTTTG AAGCTCCCCC 3300 

GACACTATGG AGCCGAGTCA CAAAGTTTGG ATCAGGATGG GGCTTTTGGG TCAGCCCGAC 3360 

AGTGTTCATC ACAACCACAC ATGTAGTGCC AACTGGTGTG AAAGAATTCT TTGGTGAGCC 3420 

CCTATCTAGT ATAGCAATCC ACCAAGCAGG TGAGTTCACA CAATTCAGGT TCTCAAAGAA 3480 

AATGCGCCCT GACTTGACAG GTATGGTCCT TGAAGAAGGT TGCCCTGAAG GGACAGTCTG 3540 

CTCAGTCCTA ATTAAACGGG ATTCGGGTGA ACTACTTCCG CTAGCCGTCC GTATGGGGGC 3600 

TATTGCCTCC ATGAGGATAC AGGGTOGGCT TGTCCATGGC CAATCAGGGA TGTTACTGAC 3660 

AGGGGCCAAT GCAAAGGGGA TGGATCTTGG CACTATACCA GGAGACTGCG GGGCACCATA 3720 

CGTCCACAAG CGCGGGAATG ACTGGGTTGT GTGTGGAGTC CACGCTGCAG CCACAAAGTC 3780 

AGGCAAGACC GTGGTCTGCG CTGTACAGGC TGGAGAGGGC GAAACCGCAC TAGAAGGTGG 3840 

AGACAAGGGG CATTATGCCG GCCACGAGAT TGTGAGGTAT GGAAGTGGCC CAGCACTGTC 3900 

AACTAAAACA AAATTCTGGA GGTCCTCCCC AGAACCACTG CCCCCCGGAG TATATGAGCC 3960 

AGCATACCTG GGGGGCAAGG ACCCCCGTGT ACAGAATGGC CCATCCCTAC AACAGGTACT 4020 

ACGTGACCAA CTGAAACCCT TTGCGGACCC CCGCGGCCGC ATGCCTGAGC CTGGCCTACT 4080 

GGAGGCTGCG GTTGAGACTG TAACATCCAT GTTAGAACAG ACAATGGATA CCCCAAGCCC 4140 

GTGGTCTTAC GCTGATGCCT GCCAATCTCT TGACAAAACT ACTAGTTCGG GGTACCCTCA 4200 

CCATAAAAGG AAGAATGATG ATTGGAATGG CACCACCTTC GTTGGAGAGC TCGGTGAGCA 4260 

AGCTGCACAC GCCAACAATA TGTATGAGAA TGCTAAACAT ATGAAACCCA TTTACACTGC 4320 

AGCCTTAAAA GATGAACTAG TCAAGCCAGA AAAGATTTAT CAAAAAGTCA AGAAGCGTCT 4380 

ACTATGGGGC GCCGATCTCG GAACAGTGGT CAGGGCCGCC CGGGCTTTTG GCCCATTTTG 4440 

TGACGCTATA AAATCACATG TCATCAAATT GCCAATAAAA GTTGGCATGA ACACAATAGA 4500 

AGATGGCCCC CTCATCTATG CTGAGCATGC TAAATATAAG AATCATTTTG ATGCAGATTA 4560 

TACAGCAIGG GACTCAACAC AAAAIAGACA AATTATGACA GAATCCTTCT CCATTATGTC 4620 

GCGCCTTACG GCCTCACCAG AATTGGCCGA GGTTGTGGCC CAAGATTTGC TAGCACCATC 4680 

TGAGATGGAT GTAGGTGATT ATGTCATCAG GGTCAAAGAG GGGCTGCCAT CTGGATTCCC 4740 

ATGTACTTCC CAGGTGAACA GCATAAATCA CTGGATAATT ACTCTCTGTG CACTGTCTGA 4800 

GGCCACTGGT TTATCACCTG ATGTGGTGCA ATCCATGTCA TATTTCTCAT TTTATGGTGA 4860 

TGATGAGATT GTGTCAACTG ACATAGATTT TGACCCAGCC CGCCTCACTC AAATTCTCAA 4920 

GGAATATGGC CTCAAACCAA CAAGGCCTGA CAAAACAGAA GGACCAATAC AAGTGAGGAA 4980 

AAATGTGGAT GGACTGGTCT TCTTGCGGCG CACCATTTCC CGTGATGCGG CAGGGTTCCA 5040 

AGGCAGGTTA GATAGGGCTT CGATTGAACG CCAAATCTTC TGGACCCGCG GGCCCAATCA 5100 

TTCAGATCCA TCAGAGACTC TAGTGCCACA CACTCAAAGA AAAATACAGT TGATTTCACT 5160 

TCTAGGGGAA GCTTCACTCC ATGGTGAGAA ATTTTACAGA AAGATTTCCA GCAAGGTCAT 5220 

ACATGAAATC AAGACTGGTG GATTGGAAAT GTATGTCCCA GGATGGCAGG CCATGTTCCG 5280 

CTGGATGCGC TTCCATGACC TCGGATTGTG GACAGGAGAT CGOGATCTTC TGCCCGAATT 5340 

CGTAAATGAT GATGGCGTCT AAGGACGCTA CATCAAGCGT GGATGGCGCT AGTGGCGCTG 5400 

GTCAGTTGGT ACCGGAGGTT AATGCTTCTG ACCCTCTTGC AATGGATCCT GTAGCAGGTT 5460 

CTTCGACAGC AGTCGCGACT GCTGGACAAG TTAATCCTAT TGATCCCTGG ATAATTAATA 5520 

ATTTTGTGCA AGCCCCCCAA GGTGAATTTA CTATTTCCCC AAATAATACC CCCGGTGATG 5580 

TTTTGTTTGA TTTGAGTTTG GGTCCCCATC TTAATCCTTT CTTGCTCCAT CTATCACAAA 5640 

TGTATAATGG TTGGGTTGGT AACATGAGAG TCAGGATTAT GCTAGCTGGT AATGCCTTTA 5700 

CTGCGGGGAA GATAATAGTT TCCTGCATAC CCCCTGGTTT TGGTTCACAT AATCTTACTA 5760 

TAGCACAAGC AACTCTCTTT CCACATGTGA TTGCTGATGT TAGGACTCTA GACCCCATTG 5820 

AGGTGCCTTT GGAAGATGTT AGGAATGTTC TCTTTCATAA TAATGATAGA AATCAACAAA 5880 

CCATGCGCCT TGTGTGCATG CTGTACACCC CCCTCCGCAC TGGTGGTGGT ACTGGTGATT 5940 

CTTTTGTAGT TGCAGGGCGA GTTATGACTT GCCCCAGTCC TGATTTTAAT TTCTTGTTTT 6000 

TAGTCCCTCC TACGGTGGAG CAGAAAACCA GGCCCTTCAC ACTCCCAAAT CTGCCATTGA 6060 

GtTCTCTGTC TAACTCACGT GCCCCTCTCC CAATCAGTAG TATGGGCATT TCCiCCAGACA 6120 

ATGTCCAGAG TGTGCAGTTC CAAAATGGTC GGTGTACTCT GGATGGCCGC CTGGTTGGCA 6180 

CCACCCCAGT TTCATTGTCA CATGTTGCCA AGATAAGAGG GACCTCCAAT GGCACTGTAA 6240 

TCAACCTTAC TGAATTGGAT GGCACACCCT TTCACCCTTT TGAGGGCCCT GCCCCCATTG 6300 

GGTTTCCAGA CCTCGGTGGT TGTGATTGGC ATATCAATAT GACACAGTTT GGCCATTCTA 6360 

GCCAGACCCA GTATGATGTA GACACCACCC CTGACACTTT TGTCCCCCAT CTTGGTTCAA 6420 

TTCAGGCAAA TGGCATTGGC AGTGGTAATT ATGTTGGTGT TCTTAGCTGG ATTTCCCCCC 6480 
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CATCACACCC GTCTGGCTCC CAAGTTGACC TTTGGAAGAT CCCCAATTAT GGGTCAAGTA 6540 

TTACGGAGGC AACACATCTA GCCCCTTCTG TATACCCCCC TGGTTTCGGA GAGGTATTGG 6600 

TCTTTTTCAT GTCAAAAATG CCAGGTCCTG GTGCTTATAA TTTGCCCTGT CTATTACCAC 6660 

AAGAGTACAT TTCACATCTT GCTAGTGAAC AAGCCCCTAC TGTAGGTGAG GCTGCCCTGC 6720 

TCCACTATGT TGACCCTGAT ACCGGTCGGA ATCTTGGGGA ATTCAAAGCA TACCCXGATG 6780 

GTTTCCTCAC TTGTGTCCCC AATGGGGCTA GCTCGGGTCC ACAACAGCTG CCGATCAATG 6840 

GGGTCTTTGT CTTTGTTTCA TGGGTGTCCA GATTTTATCA ATTAAAGCCT GTGGGAACTG 6900 

CCAGCTCGGC AAGAGGTAGG CTTGGTCTGC GCCGATAATG GCCCAAGCCA TAATTGGTGC 6960 

AATTGCTGCT TCCACAGCAG GTAGTGCTCT GGGAGCGGGC ATACAGGTTG GTGGCGAAGC 7020 

GGCCCTCCAA AGCCAAAGGT ATGAACAAAA TTTGCAACTG CAAGAAAATT CTTTTAAACA 7080 

TGACAGGGAA ATGATTGGGT ATCAGGTTGA AGCTTCAAAT CAATTATTGG CTAAAAATTT 7140 

GGCAACTAGA TATTCACTCC TCCGTGCTGG GGGTTTGACC AGTGCTGATG CAGCAAGATC 7200 

TGTGGCAGGA GCTCCAGTCA CCCGCATTGT AGATTGGAAT GGCGTGAGAG TGTCTGCTCC 7260 

CGAGTCCTCT GCTACCACAT TGAGATCCGG TGGCTTCATG TGAGTTCCCA TACCATTTGC 7320 

CTCTAAGCAA AAACAGGTTC AATCATCTGG TATTAGTAAT CCAAATTATT CCCCTTCATC 7380 

CATTTCTCGA ACCACTAGTT GGGTCGAGTC ACAAAACTCA TCGAGATTTG GAAATCTTTC 7440 

TCCATACCAC GCGGAGGCTC TCAATACAGT GTGGTTGACT CCACCCGGTT CAACAGCCTC 7500 

TTCTACACTG TCTTCTGTGC CACGTGGTTA TTTCAATACA GACAGGTTGC CATTATTCGC 7560 

AAATAATAGG CGATGATGTT GTAATATGAA ATGTGGGCAT CATATTCATT TAATTAGGTT 7620 

TAATTAGGTT TAATTTGATG TTAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 7680 

AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 7740 

AAAAAAAAAA AAA 7753 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1738 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Ala Leu Gly Leu lie Gly Gin Val Pro Ala Pro Lys Ala Thr Ser 
15 10 15 

Val Asp Val Pro Lys Gin Gin Arg Asp Arg Pro Pro Arg Thr Val Ala 
20 25 30 

Glu Val Gin Gin Asn Leu Arg Trp Thr Glu Arg Pro Gin Asp Gin Asn 
35 40 45 

Val Lys Thr Trp Asp Glu Leu Asp His Thr Thr Lys Gin Gin lie Leu 
50 55 60 

Asp Glu His Ala Glu Trp Phe Asp Ala Gly Gly Leu Gly Pro Ser Thr 
65 70 75 80 

Leu Pro Thr Ser His Glu Arg Tyr Thr His Glu Asn Asp Glu Gly His 
85 90 95 

Gin Val Lys Trp Ser Ala Arg Glu Gly Val Asp Leu Gly lie Ser Gly 
100 105 HO 

Leu Thr Thr Val Ser Gly Pro Glu Trp Asn Met Cys Pro Leu Pro Pro 
115 120 125 

Val Asp Gin Arg Ser Thr Thr Pro Ala Thr Glu Pro Thr lie Gly Asp 
130 135 140 

Met lie Glu Phe Tyr Glu Gly His lie Tyr His Tyr Ala lie Tyr lie 
145 150 155 160 
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Gly Gin Gly Lys Thr Val Gly Val His Ser Pro Gin Ala Ala Phe Ser 
165 170 175 

lie Thr Arg lie Thr lie Gin Pro lie Ser Ala Trp Trp Arg Val Cys 
180 185 190 

Tyr Val Pro Gin Pro Lys Gin Arg Leu Thr Tyr Asp Gin Leu Lys Glu 
195 200 205 

Leu Glu Asn Glu Pro Trp Pro Tyr Ala Ala Val Thr Asn Asn Cys Phe 
210 215 220 

Glu Phe Cys Cys Gin Val Met Cys Leu Glu Asp Thr Trp Leu Gin Arg 
225 230 235 240 

Lys Leu lie Ser Ser Gly Arg Phe Tyr His Pro Thr Gin Asp Trp Ser 
245 250 255 

Arg Asp Thr Pro Glu Phe Gin Gin Asp Ser Lys Leu Glu Met Val Arg 
260 265 270 

Asp Ala Val Leu Ala Ala lie Asn Gly Leu Val Ser Arg Pro Phe Lys 
275 280 285 

Asp Leu Leu Gly Lys Leu Lys Pro Leu Asn Val Leu Asn Leu Leu Ser 
290 295 300 

Asn Cys Asp Trp Thr Phe Met Gly Val Val Glu Met Val Val Leu Leu 
305 310 315 320 

Leu Glu Leu Phe Gly lie Phe Trp Asn Pro Pro Asp Val Ser Asn Phe 
325 330 335 

He Ala Ser Leu Leu Pro Asp Phe His Leu Gin Gly Pro Glu Asp Leu 
340 345 350 

Ala Arg Asp Leu Val Pro lie Val Leu Gly Gly lie Gly Leu Ala lie 
355 360 365 

Gly Phe Thr Arg Asp Lys Val Ser Lys Met Met Lys Asn Ala Val Asp 
370 375 380 

Gly Leu Arg Ala Ala Thr Gin Leu Gly Gin Tyr Gly Leu Glu lie Phe 
385 390 395 400 

Ser Leu Leu Lys Lys Tyr Phe Phe Gly Gly Asp Gin Thr Glu Lys Thr 
405 410 415 

Leu Lys Asp lie Glu Ser Ala Val lie Asp Met Glu Val Leu Ser Ser 
420 425 430 

Thr Ser Val Thr Gin Leu Val Arg Asp Lys Gin Ser Ala Arg Ala Tyr 
435 440 445 

Met Ala lie Leu Asp Asn Glu Glu Glu Lys Ala Arg Lys Leu Ser Val 
450 455 460 

Arg Asn Ala Asp Pro His Val Val Ser Ser Thr Asn Ala Leu lie Ser 
465 470 475 480 

Arg lie Ser Met Ala Arg Ala Ala Leu Ala Lys Ala Gin Ala Glu Met 
485 490 495 

Thr Ser Arg Met Arg Pro Val Val lie Met Met Cys Gly Pro Pro Gly 
500 505 510 
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lie Gly Lys Thr Lys Ala Ala Glu HiB Leu Ala Lys Arg Leu Ala Asn 
515 520 525 

Glu lie Arg Pro Gly Gly Lys Val Gly Leu Val Pro Arg Glu Ala Val 
530 535 540 

Asp His Trp Asp Gly Tyr His Gly Glu Glu Val Met Leu Trp ABp Asp 
545 550 555 560 

Tyr Gly Met Thr Lys He Gin Glu Asp Cys Asn Lys Leu Gin Ala He 
565 570 575 

Ala Asp Ser Ala Pro Leu Thr Leu Asn CyB Asp Arg He Glu Asn Lys 
580 585 590 

Gly Met Gin Phe Val Ser Asp Ala He Val He Thr Thr Asn Ala Pro 
595 600 605 

Gly Pro Ala Pro Val Asp Phe Val Asn Leu Gly Pro Val Cys Arg Arq 
610 615 620 

Val Asp Phe Leu Val Tyr Cys Thr Ala Pro Glu Val Glu His Thr Arg 
625 630 635 640 

Lys Val Ser Pro Gly Asp Thr Thr Ala Leu Lys ABp Cys Phe Lys Pro 
645 650 655 

Asp Phe Ser His Leu Lys Met Glu Leu Ala Pro Gin Gly Gly Phe Asp 
660 665 670 

Asn Gin Gly Asn Thr Pro Phe Gly Lys Gly Val Met Lys Pro Thr Thr 
675 680 685 

He Asn Arg Leu Leu He Gin Ala Val Ala Leu Thr Met Glu Arg Gin 
690 695 700 

Asp Glu Phe Gin Leu Gin Gly Pro Thr Tyr Asp Phe Asp Thr Asp Ara 
705 710 715 720 

Val Ala Ala Phe Thr Arg Met Ala Arg Ala Asn Gly Leu Gly Leu He 
725 730 735 

Ser Met Ala Ser Leu Gly Lys Lys Leu Arg Ser Val Thr Thr He Glu 
740 745 750 

Gly Leu Lys Asn Ala Leu Ser Gly Tyr Lys He Ser Lys Cys Ser He 
755 760 765 

Gin Trp Gin Ser Arg Val Tyr He He Glu Ser Asp Gly Ala Ser Val 
770 775 780 

Gin He Lys Glu Asp Lys Gin Ala Leu Thr Pro Leu Gin Gin Thr He 
785 790 795 800 

Asn Thr Ala Ser Leu Ala He Thr Arg Leu Lys Ala Ala Arg Ala Val 
805 810 815 

Ala Tyr Ala Ser Cys Phe Gin Ser Ala He Thr Thr He Leu Gin Met 
820 825 830 

Ala Gly Ser Ala Leu Val He Asn Arg Ala Val Lys Arg Met Phe Gly 
835 840 845 

Thr Arg Thr Ala Ala Met Ala Leu Glu Gly Pro Gly Lys Glu His Asn 
850 855 860 
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Cys Arg Val His Lys Ala Lys Glu Ala Gly Lys Gly Pro lie Gly His 
865 870 875 880 

Asp Asp Met Val Glu Arg Phe Gly Leu Cys Glu Thr Glu Glu Glu Glu 
885 890 895 

Ser Glu Asp Gin lie Gin Met Val Pro Ser Asp Ala Val Pro Glu Gly 
900 905 910 

Lys Asn Lys Gly Lys Thr Lys Lys Gly Arg Gly Arg Lys Asn Asn Tyr 
915 920 925 

Asn Ala Phe Ser Arg Arg Gly Leu Ser Asp Glu Glu Tyr Glu Glu Tyr 
930 935 940 

Lys Lys lie Arg Glu Glu Lys Asn Gly Asn Tyr Ser lie Gin Glu Tyr 
945 950 955 960 

Leu Glu Asp Arg Gin Arg Tyr Glu Glu Glu Leu Ala Glu Val Gin Ala 
965 970 975 

Gly Gly Asp Gly Gly lie Gly Glu Thr Glu Met Glu He Arg His Arg 
980 985 990 

Val Phe Tyr Lys Ser Lys Ser Lys Lys His Gin Gin Glu Gin Arg Arg 
995 1000 1005 

Gin Leu Gly Leu Val Thr Gly Ser Asp He Arg Lys Arg Lys Pro He 
1010 1015 1020 

Asp Trp Thr Pro Pro Lys Asn Glu Trp Ala Asp Asp Asp Arg Glu Val 
1025 1030 1035 1040 

Asp Tyr Asn Glu Lys He Asn Phe Glu Ala Pro Pro Thr Leu Trp Ser 
1045 1050 1055 

Arg Val Thr Lys Phe Gly Ser Gly Trp Gly Phe Trp Val Ser Pro Thr 
1060 1065 1070 

Val Phe He Thr Thr Thr His Val Val Pro Thr Gly Val Lys Glu Phe 
1075 1080 1085 

Phe Gly Glu Pro Leu Ser Ser He Ala He His Gin Ala Gly Glu Phe 
1090 1095 1100 

Thr Gin Phe Arg Phe Ser Lys Lys Met Arg Pro Asp Leu Thr Gly Met 
1105 1110 1115 1120 

Val Leu Glu Glu Gly Cys Pro Glu Gly Thr Val Cys Ser Val Leu He 
1125 1130 1135 

Lys Arg Asp Ser Gly Glu Leu Leu Pro Leu Ala Val Arg Met Gly Ala 
1140 1145 1150 

He Ala Ser Met Arg lie Gin Gly Arg Leu Val His Gly Gin Ser Gly 
1155 1160 1165 

Met Leu Leu Thr Gly Ala Asn Ala Lys Gly Met Asp Leu Gly Thr He 
1170 1175 1180 

Pro Gly Asp Cys Gly Ala Pro Tyr Val His Lys Arg Gly Asn Asp Trp 
1185 1190 1195 1200 

Val Val Cys Gly Val His Ala Ala Ala Thr Lys Ser Gly Asn Thr Val 
1205 1210 1215 



WO 94/05700 PCI7US93/08447 

67 



Val Cys Ala Val Gin Ala Gly Glu Gly Glu Thr Ala Leu Glu Gly Gly 
1220 1225 1230 

Asp Lys Gly His Tyr Ala Gly His Glu He Val Arg Tyr Gly Ser Gly 
1235 1240 1245 

Pro Ala Leu Ser Thr Lys Thr Lys Phe Trp Arg Ser Ser Pro Glu Pro 
1250 1255 1260 

Leu Pro Pro Gly Val Tyr Glu Pro Ala Tyr Leu Gly Gly Lys Asp Pro 
1265 1270 1275 1280 

Arg Val Gin Asn Gly Pro Ser Leu Gin Gin Val Leu Arg Asp Gin Leu 
1285 1290 1295 

Lys Pro Phe Ala Asp Pro Arg Gly Arg Met Pro Glu Pro Gly Leu Leu 
1300 1305 1310 

Glu Ala Ala Val Glu Thr Val Thr Ser Met Leu Glu Gin Thr Met Asp 
1315 1320 1325 

Thr Pro Ser Pro Trp Ser Tyr Ala Asp Ala Cys Gin Ser Leu Asp Lys 
1330 1335 1340 

Thr Thr Ser Ser Gly Tyr Pro His His Lys Arg Lys Asn Asp Asp Trp 
1345 1350 1355 1360 

Asn Gly Thr Thr Phe Val Gly Glu Leu Gly Glu Gin Ala Ala His Ala 
1365 1370 1375 

Asn Asn Met Tyr Glu Asn Ala Lys His Met Lys Pro He Tyr Thr Ala 
1380 1385 1390 

Ala Leu Lys Asp Glu Leu Val Lys Pro Glu Lys He Tyr Gin Lys Val 
1395 1400 1405 

Lys Lys Arg Leu Leu Trp Gly Ala Asp Leu Gly Thr Val Val Arg Ala 
1410 1415 1420 

Ala Arg Ala Phe Gly Pro Phe Cys Asp Ala He Lys Ser His Val He 
1425 1430 1435 1440 

Lys Leu Pro He Lys Val Gly Met Asn Thr He Glu Asp Gly Pro Leu 
1445 1450 1455 

He Tyr Ala Glu His Ala Lys Tyr Lys Asn His Phe Asp Ala Asp Tyr 
1460 1465 1470 

Thr Ala Trp Asp Ser Thr Gin Asn Arg Gin He Met Thr Glu Ser Phe 
1475 1480 1485 

Ser He Met Ser Arg Leu Thr Ala Ser Pro Glu Leu Ala Glu Val Val 
1490 1495 1500 

Ala Gin Asp Leu Leu Ala Pro Ser Glu Met Asp Val Gly Asp Tyr Val 
1505 1510 1515 1520 

He Arg Val Lys Glu Gly Leu Pro Ser Gly Phe Pro Cys Thr Ser Gin 
1525 1530 1535 

Val Asn Ser He Asn His Trp He He Thr Leu Cys Ala Leu Ser Glu 
1540 1545 1550 

Ala Thr Gly Leu Ser Pro Asp Val Val Gin Ser Met Ser Tyr Phe Ser 
1555 1560 1565 
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Phe Tyr Gly Asp Asp Glu lie Val Ser Thr Asp lie Asp Phe Asp Pro 
1570 1575 1580 

Ala Arg Leu Thr Gin lie Leu Lys Glu Tyr Gly Leu Lys Pro Thr Arg 
1585 1590 1595 1600 

Pro Asp Lys Thr Glu Gly Pro lie Gin Val Arg Lys Asn Val Asp Gly 
1605 1610 1615 

Leu Val Phe Leu Arg Arg Thr lie Ser Arg Asp Ala Ala Gly Phe Gin 
1620 1625 1630 

Gly Arg Leu Asp Arg Ala Ser lie Glu Arg Gin lie Phe Trp Thr Arg 
1635 1640 1645 

Gly Pro Asn His Ser Asp Pro Ser Glu Thr Leu Val Pro His Thr Gin 
1650 1655 1660 

Arg Lys lie Gin Leu lie Ser Leu Leu Gly Glu Ala Ser Leu His Gly 
1665 1670 1675 1680 

Glu Lys Phe Tyr Arg Lys lie Ser Ser Lys Val He His Glu He Lys 
1685 1690 1695 

Thr Gly Gly Leu Glu Met Tyr Val Pro Gly Trp Gin Ala Met Phe Arg 
1700 1705 1710 

Trp Met Arg Phe His Asp Leu Gly Leu Trp Thr Gly Asp Arg Asp Leu 
1715 1720 1725 

Leu Pro Glu Phe Val Asn Asp Asp Gly Val 
1730 1735 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 530 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Met Met Ala Ser Lys Asp Ala Thr Ser Ser Val Asp Gly 
15 10 

Ala Ser Gly Ala Gly Gin Leu Val Pro Glu Val Asn Ala Ser Asp Pro 
15 20 25 30 

Leu Ala Met Asp Pro Val Ala Gly Ser Ser Thr Ala Val Ala Thr Ala 
35 40 45 

Gly Gin Val Asn Pro He Asp Pro Trp He He Asn Asn Phe Val Gin 
50 55 60 

Ala Pro Gin Gly Glu Phe Thr He Ser Pro Asn Asn Thr Pro Gly Asp 
65 70 75 

Val Leu Phe Asp Leu Ser Leu Gly Pro His Leu Asn Pro Phe Leu Leu 
80 85 90 

His Leu Ser Gin Met Tyr Asn Gly Trp Val Gly Asn Met Arg Val Arg 
95 100 105 HO 
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lie Met Leu Ala Gly Asn Ala Phe Thr Ala Gly Lys lie lie Val Ser 
115 120 125 

Cys lie Pro Pro Gly Phe Gly Ser His Asn Leu Thr lie Ala Gin Ala 
130 135 140 

Thr Leu Phe Pro His Val lie Ala Asp Val Arg Thr Leu Asp Pro lie 
145 150 155 

Glu Val Pro Leu Glu Asp Val Arg Asn Val Leu Phe His Asn Asn Asp 
160 165 170 

Arg Asn Gin Gin Thr Met Arg Lou Val Cys Met Leu Tyr Thr Pro Leu 
175 180 185 190 

Arg Thr Gly Gly Gly Thr Gly Asp Ser Phe Val Val Ala Gly Arg Val 
195 200 205 

Met Thr Cys Pro Ser Pro Asp Phe Asn Phe Leu Phe Leu Val Pro Pro 
210 215 220 

Thr Val Glu Gin Lys Thr Arg Pro Phe Thr Leu Pro Asn Leu Pro Leu 
225 230 235 

Ser Ser Leu Ser Asn Ser Arg Ala Pro Leu Pro lie Ser Ser Met Gly 
240 245 250 

lie Ser Pro Asp Asn Val Gin Ser Val Gin Phe Gin Asn Gly Arg Cys 
255 260 265 270 

Thr Leu Asp Gly Arg Leu Val Gly Thr Thr Pro Val Ser Leu Ser His 
275 280 285 

Val Ala Lys lie Arg Gly Thr Ser Asn Gly Thr Val lie Asn Leu Thr 
290 295 300 

Glu Leu Asp Gly Thr Pro Phe His Pro Phe Glu Gly Pro Ala Pro lie 
305 310 315 

Gly Phe Pro Asp Leu Gly Gly Cys Asp Trp His lie Asn Met Thr Gin 
320 325 330 

Phe Gly His Ser Ser Gin Thr Gin Tyr Asp Val Asp Thr Thr Pro Asp 
335 340 345 350 

Thr Phe Val Pro His Leu Gly Ser lie Gin Ala Asn Gly He Gly Ser 
355 360 365 

Gly Asn Tyr Val Gly Val Leu Ser Trp He Ser Pro Pro Ser His Pro 
370 375 380 

Ser Gly Ser Gin Val Asp Leu Trp Lys He Pro Asn Tyr Gly Ser Ser 
385 390 395 

He Thr Glu . Ala Thr His Leu Ala Pro Ser Val Tyr Pro Pro Gly Phe 
400 405 410 

Gly Glu Val Leu Val Phe Phe Met Ser Lys Met Pro Gly Pro Gly Ala 
415 420 425 430 

Tyr Asn Leu Pro Cys Leu Leu Pro Gin Glu Tyr He Ser His Leu Ala 
435 440 445 

Ser Glu Gin Ala Pro Thr Val Gly Glu Ala Ala Leu Leu His Tyr Val 
450 455 460 
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Asp Pro Asp Thr Gly Arg Asn Leu Gly Glu Phe Lys Ala Tyr Pro Asp 
465 470 475 

Gly Phe Leu Thr Cys Val Pro Asn Gly Ala Ser Ser Gly Pro Gin Gin 
480 485 490 

Leu Pro He Asn Gly Val Phe Val Phe Val Ser Trp Val Ser Arg Phe 
495 500 505 510 

Tyr Gin Leu Lys Pro Val Gly Thr Ala Ser Ser Ala Arg Gly Arg Leu 
515 520 525 

Gly Leu Arg Arg 
530 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Ala Gin Ala He He 
1 5 

Gly Ala He Ala Ala Ser Thr Ala Gly Ser Ala Leu Gly Ala Gly He 
10 15 20 

Gin Val Gly Gly Glu Ala Ala Leu Gin Ser Gin Arg Tyr Gin Gin Asn 
25 30 35 

Leu Gin Leu Gin Glu Asn Ser Phe Lys His Asp Arg Glu Met He Gly 
40 45 50 

Tyr Gin Val Glu Ala Ser Asn Gin Leu Leu Ala Lys Asn Leu Ala Thr 
55 60 65 70 

Arg Tyr Ser Leu Leu Arg Ala Gly Gly Leu Thr Ser Ala Asp Ala Ala 
75 80 85 

Arg Ser Val Ala Gly Ala Pro Val Thr Arg He Val Asp Trp Asn Gly 
90 95 100 

Val Arg Val Ser Ala Pro Glu Ser Ser Ala Thr Thr Leu Arg Ser Gly 
105 HO us 

Gly Phe Met Ser Val Pro He Pro Phe Ala Ser Lys Gin Lys Gin Val 
120 125 130 

Gin Ser Ser Gly He Ser Asn Pro Asn Tyr Ser Pro Ser Ser He Ser 
135 140 145 150 

Arg Thr Thr Ser Trp Val Glu Ser Gin Asn Ser Ser Arg Phe Gly Asn 
155 160 165 

Leu Ser Pro Tyr His Ala Glu Ala Leu Asn Thr Val Trp Leu Thr Pro 
170 175 180 

Pro Gly Ser Thr Ala Ser Ser Thr Leu Ser Ser Val Pro Arg Gly Tvr 
185 190 195 
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Phe Asn Thr Asp Arg Leu Pro Leu Phe Ala Asn Asn Arg Arg 
200 205 210 



(2) INFORMATION FOR SEQ ID 110:5: 

(i) SEQUENCE CHARACTERISTICS: . 

(A) LENGTH: 551 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human calici virus Sapporo 

{ ix ) FEATURE : 

(A) NAME /KEY: CDS 

(B) LOCATION: 1..549 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



TGTGATGCTG 


CCACCACGCT 


TATAGCCACC 


GOGGCTTTTA AGGCCGTGGC 


TACNAGGCTA 


60 


CAGGTGGTGA 


CACCAATGAC 


ACCAGTTGCT 


GTTGGCATTA 


ACATGGACTC 


TGTTCAGATG 


120 


CAAGTGATGA 


ATGACTCTTT 


AAAGGGGGGT 


GTTCTTTACT 


GTTTGGATTA 


TTCCAAATGG 


180 


GATTCCACAC 


AAAACCCTGC 


AGTGACAGCA 


GCCTCCCTGG 


CAATATTGGA 


GAGATTTGCT 


240 


GAGCCCCATC 


CAATTGTGTC 


TTGTGCCATT 


GAGGCTCTTT 


CCTCCCCTGC 


AGAGGGCTAT 


300 


GTCAATGATA 


TCAAATTTGT 


GACACGCGGC 


GGCCTACCAT 


CTGGGATGCC 


ATTTACATCT 


360 


GTCGTCAATT 


CTATCAACCA 


TATGATATAC 


GTGGCGGCAG 


CCATCCTGCA 


GGCATACGAA 


420 


AGCCACAATG 


TCCCATATAC 


TGGAAACGTC 


TTCCAAGTGG 


AGACCGTTCA 


CACGTATGGT 


480 


GATGATTGCA 


TGTACAGCGT 


GTGCCCTGCC 


ACTGCATCAA 


TTTTCCACAC 


TGTGCTTGCA 


540 


AACCTAACGT 


C 










551 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Cys Asp Ala Ala Thr Thr Leu lie Ala Thr Ala Ala Phe Lys Ala Ala 
1 5 10 15 

Val Xaa Arg Leu Gin Val Val Thr Pro Met Thr Pro Val Ala Val Gly 
20 25 30 

lie Asn Met Asp Ser Val Gin Met Gin Val Met Asn Asp Ser Leu Lys 
35 40 45 
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Gly Gly Val Leu Tyr Cys Leu Asp Tyr Ser Lys Trp Asp Ser Thr Gin 
50 55 60 

Asn Pro Ala Val Thr Ala Ala Ser Leu Ala lie Leu Glu Arg Phe Ala 
65 70 75 80 

Glu Pro His Pro He Val Ser Cys Ala He Glu Ala Leu Ser Ser Pro 
85 90 95 

Ala Glu Gly Tyr Val Asn Asp He Lys Phe Val Thr Arg Gly Gly Leu 
100 105 110 

Pro Ser Gly Met Pro Phe Thr Ser Val Val Asn Ser He Asn His Met 
115 120 125 

He Tyr Val Ala Ala Ala He Leu Gly Ala Tyr Glu Ser His Asn Val 
130 135 140 

Pro Tyr Thr Gly Asn Val Phe Gin Val Glu Thr Val His Thr Tyr Gly 
145 150 155 160 

Asp Asp Cys Met Tyr Ser Val Cys Pro Ala Thr Ala Ser He Phe His 
165 170 175 

Thr Val Leu Ala Asn Leu Thr 
180 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTHS 148 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TGTGATGCTG CCACCACGCT TATAGCCACC GCGGCTTTTA AGGCCGTGGC TACAGGCTAC 60 
AGGTGGTGAC ACCAATGACA CCAGTTGCTG TTGGCATTAA CATGGACTCT GTTCAGATGC 120 
AAGTGATGAA TGACTCTTTA AAGGGGGG 148 

(2) INFORMATION FOR SEQ ID NO: 8 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 449 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8 s 

ATGGACTCTG TTCAGATGCA AGTGATGAAT GACTCTTTAA AGGGGGGTGT TCTTTACTGT 60 

TTGGATTATT CCAAATGGGA TTCCACACAA AACCCTGCAG TGACAGCAGC CTCCCTGGCA 120 

ATATTGGAGA GATTTGCTGA GCCCCATCCA ATTGTGTCTT GTGCCATTGA GGCTCTTTCC 180 

TCCCCTGCAG AGGGCTATGT CAATGATATC AAATTTGTGA CACGCGGCGG CCTACCATCT 240 
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GGGATGCCAT TTACATCTGT CGTCAATTCT ATCAACCATA TGATATACGT GGCGGCAGCC 300 

ATCCTGCAGG CATACGAAAG CCACAATGTC CCATATACTG GAAACGTCTT CCAAGTGGAG 360 

ACCGTTCACA CGTATGGTGA TGATTGCATG TACAGCGTGT GCCCTGCCAC TGCATCAATT 420 

TTCCACACTG TGCTTGCAAA CCTAACGTC 449 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 446 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human calicivirus Saporro (Day care) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ATGGACTCTG TTCAGATGCA AGTGATGAAT GACTCTTTAA AGGGAGGTGT TCTCTACTGC 60 

CTGGATTACT CCAAATGGGA CTCCACACAA AATGCTGCAG TGACAGCAGC ATCCCTNNCA 120 

ATATTGGAGA GATTTGCTGA ACCCCACCCA ATTGTGTCTT GTGCCATTGA GGCCCTGNNC 180 

TCNNCTGCAG AGGGTTACGT TAATGATATC AAGTTTGTGA CACGTGGCGG CCTACCATGT 240 

GGGATGCCAT TCACATCTGT TGTCAATTCC ATCAACCACA TNATATACGT GGCAGCCGCC 300 

ATCCTGCAGG CATACGAAAG CCACAATGTT CCATACACTG GAAATGTCTT CCAAGTGGAG 360 

ACTGTTCACA CGTATGGTGA CGATTGCATG TACAGCGTGT GCCCTGCCAC CGCATCAATT ' 420 

TTCCACACTG TACTTGCAAA CCTAAC 446 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 434 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human calicivirus Houston 

<ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3.. 434 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GGCCATGTTA TAGTGGTGTT CACATGAAAG ATGGCGACAA GATGTTGATA GATGCCAATC 60 

TTCCTTACAA CCAGAAATTA ACTACTATGA TTCATGAGAC TAGGCATAGG ATAGGACAGT 120 

ATATAGATAA TACTTTTGGA AAGACATTTA GACATGGATT GACAAAACCT GCTGACAAGA 180 
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CTGTAGATTT GATCTATAAG ACATTGAATT ATGATGATTT TCTGGCAATA ATGCTAATCA 240 

TATATGGGCA AAAGTCGGCC ACTAATACGG AGTTGCAATT CTTGATGGAG AAACTTAGAG 300 

GTTATGAATC TACAATGGAT GACATAGGGA AAGTCTATGG AGATGATAAA ATGAGAGATA 360 

TAATCAAGAA TATTTCTGAT GATGACATAA AGAGTCTTTT AGGGGAGATA AATAGTGATT 420 

ATTCTGGTAA GNAT 434 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 144 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

Pro Cys Tyr Ser Gly Val His Met Lys Asp Gly Asp Lys Met Leu He 
15 10 15 

Asp Ala Asn Leu Pro Tyr Asn Gin Lys Leu Thr Thr Met He His Glu 
20 25 30 

Thr Arg His Arg He Gly Gin Tyr He Asp Asn Thr Phe Gly Lys Thr 
35 40 45 

Phe Arg His Gly Leu Thr Lys Pro Ala Asp Lys Thr Val Asp Leu He 
50 55 60 

Tyr Lys Thr Leu Asn Tyr Asp Asp Phe Leu Ala He Met Leu He He 
65 70 75 80 

Tyr Gly Gin Lys Ser Ala Thr Asn Thr Glu Leu Gin Phe Leu Met Glu 
85 90 95 

Lys Leu Arg Gly Tyr Glu Ser Thr Met Asp Asp He Gly Lys Val Tyr 
100 105 HO 

Gly Asp Asp Lys Met Arg Asp He He Lys ABn He Ser Asp Asp Asp 
115 120 125 

He Lys Ser Leu Leu Gly Glu He Asn Ser Asp Tyr Ser Gly Lys Xaa 
130 135 140 

<2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2516 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/KY/89 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
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GAATAGAGGA 


TGGCCCTTTA 


ATTTATGCTG 


AACATGCCAA 


GTACAAAAAT 


CATTTTGATG 


60 


CAGATTAGAC 


AGCATGGGAC 


TCTACACAAA 


ATAGACAAAT 


TATGAGAGAA 


TGCTTCTCCA 


120 


TCATGTCACG 


CCTTAOGGCC 


TCTCCAGAAC 


TAGCTGAGGT 


TGTAGCCCAG 


GACTTACTAG 


180 


CACCATCCGA 


GATGGATGTG 


GGCGACTATG 


TTATAAGGGT 


CAAAGAAGGC 


CTACCATCAG 


240 


GATTTCCCTG 


CACTTCTCAA 


GTGAATAGCA 


TAAATCACTG 


GATAATCACC 


CTTTGTGCAT 


300 


TGTCTGAGGC 


TACTGGCTTA 


TCACCTGATG 


TGGTACAGTC 


CATGTCATAC 


TTCTCATTCT 


360 


ACGGTGATGA 


TGAGATCGTA 


TCAACTGACA 


TAGACTTTGA 


CCCAACTOGC 


CTCACCCAAA 


420 


TTCTCAAGGA 


ATACGGCCTC 


AAGCCAACAA 


GGCCAGACAA 


AACAGAAGGA 


CCAATACAGG 


480 


TGAGGAAGAA 


TGTGGATGGG 


CTAGTTTTTC 


TGOGGCGCAC 


CATCTCCCGG 


GACGCAGCAG 


540 


GGTTCCAAGG 


TAGACTGGAT 


AGAGCCTCAA 


TTGAACGTCA 


AATTTTCTGG 


ACCCGCGGGC 


600 


CCAACCATTC 


AGACCCATCA 


GAGACTCTGG 


TACCACACAC 


CCAAAGGAAA 


GTCCAGCTGA 


660 


TCTCACTTCT 


AGGAGAAGCC 


TCACTCCACG 


GGGAAAAATT 


TTACAGGAAA 


ATATCTAGCA 


720 


AAGTCATACA 


TGAAATTAAG 


ACTGGTGGGC 


TGGAGATGTA 


TGTCCCAGGG 


TGGCAGGCCA 


780 


TGTTCCGCTG 


GATGCGCTTC 


CATGACCTOG 


GATTGTGGAC 


AGGAGATCGC 


AATCTCCTGC 


840 


CCGAATTCGT 


AAATGATGAT 


GGCGTCTAAG 


GACGCTACGT 


CAAGCGTGGA 


TGGCGCCAGT 


900 


GCGTCGGTTC 


AGTTGGTACC 


GGAGGTTAAT 


GCTTCTGACC 


CTCTTGCAAT 


GGATCCTGTG 


960 


GCGGGTTCTT 


CAACAGCAGT 


TGCAACCGCT 


GGACAAGTTA 


ACCCTATTGA 


CCCTTGGATA 


1020 


ATCAATAACT 


TTGTGCAGGC 


TCCCCAAGGT 


GAATTTACTA 


TTTCTCCAAA 


TAATACCCCC 


1080 


GGTGATGTTT 


TGTTTGATTT 


GAGTCTAGGC 


CCTCATCTTA 


ATCCCTTCTT 


GTTACATTTG 


1140 


TCACAAATGT 


ATAATGGCTG 


GGTTGGCAAC 


ATGAGAGTTA 


GGATTATGCT 


GGCTGGTAAT 


1200 


GCATTTACTG 


CAGGCAAAAT 


TATAGTTTCT 


TGCATACCTC 


CTGGCTTTGG 


CTCCCAACAA 


1260 


CTTACTATAG 


CACAAGCAAC 


TCTCTTCCCG 


CATGTGATTG 


CTGATGTTAG 


GACTTTAGAC 


1320 


CCAATTGAAG 


TACCCTTGGA 


AGATGTAAGG 


AATGTTCTCT 


TTCATAATAA 


TGATAGAAAT 


1380 


CAACAAACTA 


TGCGCCTTGT 


GTGCATGCTT 


TATACCCCCC 


TCAGCACTGG 


TGGCGGTACA 


1440 


GGTGATTCTT 


TTGTGGTTGC 


AGGGCGAGTC 


ATGACTTGTC 


CTAGCCCCGA 


CTTTAATTTC 


1500 


TTGTTCTTGG 


TTCCTCCCAC 


AGTGGAAGAG 


AAGACTAGGC 


CTTTCACCCT 


CCCAAATTTA 


1560 


CCGCTGAGTT 


CTTTGTCTAA 


TTCACGTGCT 


CCTCTTCCAA 


TTAGTGGCAT 


GGGTATTTCT 


1620 


CCAGATAATG 


TTCAGAGTGT 


GCAGTTCCAA 


AATGGCCGAT 


GTACCTTAGA 


TGGACGTCTT 


1680 


GTTGGCACCA 


CCCCAGTTTC 


CCTCTCCCAT 


GTTGCTAAGA 


TAAGGGGTAC 


TTCTAATGGT 


1740 


ACAGTAATCA 


ATCTCACCGA 


ATTGGATGGC 


ACCCCCTTCC 


ACCCTTTTGA 


AGGCCCTGCC 


1800 


CCTATTGGTT 


TTCCAGATCT 


TGGTGGCTGT 


GATTGGCATA 


TTAATATGAC 


ACAATTTGGA 


1860 


CATTCCAGTC 


AGACTCAGTA 


TGATGTAGAC 


ACCACCCCCG 


ACACCTCCGT 


CCCTCACTTA 


1920 


GGTTCAATCC 


AGGCGAATGG 


CATTGGTAGT 


GGCAACTATA 


TTGGTGTTCT 


TAGCTGGGTC 


1980 
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TCCCCCCCAT 


CACATCCATC 


TGGCTCTCAA 


GTTGATCTCT 


GGAAGATCCC 


CAACTATGGG 


2040 


TCTAGTATCA 


CAGAGGCAAC 


CCATCTAGCT 


CCCTCTGTCT 


ATTCTCCTGG 


CTTTGGAGAG 


2100 


GTGCTAGTCT 


TTTTCATGTC 


AAAGATACCA 


GGTCCTGGTG 


GTGATAGTCT 


GCCCTGTTTA 

WW W A V* AAA 


2160 


CTGCCACAAG 


GATATATCTC ACACCTTGCA AGTGAACAAG 


CCCCAACTGT 


TGGTGAGGGT 


2220 


CCCCTGCTCC 


ACTATGTTGA 


CCCTGACAOG 


GACCGGAATC 


TTGGGGAGTT 


TAAGGCTTAC 


2280 


CCTGATGGTT 


TCCTAACCTG 


TGTCCCTAAT 


GGGGCCAGCT 


OGGGCCCACA 


ACAACTACCA 


2340 


ATCAATGGAG 


TCTTTGTCTT 


TGTTTCATGG 


GTGTCCAGAT 


TTTATCAGTT 


AAAGCCTGTG 


2400 


GGAACTGCCA 


GTACGGCAAG 


AGGTAGGCTT 


GGTTTGCGCC 


GATAATGGCT 


CAGGCTATAA 


2460 


TTGGTGCAAT 


TGCCGCCTCT 


ACAGCAGGTA 


GTGCTTTAGG 


GGCAGGTATA 


CAGGTT 


2516 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 124 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: primate calcicvirus 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TGGACGGACC TGCTGTTGAA GATCTCTTCA AAGGCTCGAA CGACCAAAGC ACGATCGGTA 

TTGTGTTGAC TACGCAAAGT GGGACTCAAC CCACCACCAA AAGTAACATC CAATCAATGA 

CATC 



60 
120 
124 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 110 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: primate calcicvirus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GTGAATGACA TCTTCGACTC GATGGACCTA TTCACATATG GTGATGACGG TGTCTACATC 

GTCCCACCAC TATATCATCT GTCATGCCCA AGTCTTCACC AACCTGAAAC 



60 
110 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPEt nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY i unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
CTTGTTGGTT TGAGGCCATA T 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
ATAAAAGTTG GCATGAACA 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
GTTGACACAA TCTCATCATC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
GGCCTGCCAT CTGGATTGCC 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGGCCCCCTG GTATAGGTAA 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
TGGTGATGAC TATAGCATCA GACACAAA 



(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
ACTCACCCAA ATCCTCCA 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GTTCTGACCA CCTAACCT 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
AGTTTGGGTC CCCATCTTAA TCCTTT 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
TGAACCAAAA CCAGGGGG 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
AGCAAAGTCA TACATGAAAT 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CCATTATACA TTTGTAG 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
ATTATAGTTT CTTGCATA 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B ) TYPE: nucleic acid 
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(C) STRANDEDNESS : double 

(D) TOPOLOGY j unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi ) SEQUENCE DESCRIPTIONS SEQ ID NO:28: 
CACACTCTGG ACATTGTCTG 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CATTGGGTTT CCAGACCTA 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE « nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
ATAATTGGGG ATCTTCCAAA 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 
TAGTGGCATG GGTATTTC 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32 
TATGCCAATC ACAGCCAC 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
GTCTGGCTCC CAAGTTGACC 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
CGGTATGAGG GTCAACAT 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
TGAGGCTGCC CTGCTCCA 



(2) INFORMATION TOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 baBe pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
CCACCGCTGT CCGGGAGG 
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(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
GTTGCTGTTG GCATTAACA 



(2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 126 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE 

(A) ORGANISM: Norwalk virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

His Phe Asp Ala Asp Tyr Thr Ala Trp Asp Ser Thr Gin Asn Arg Gin 
15 io is 

He Met Thr Glu Ser Phe Ser He Met Ser Arg Leu Thr Ala Ser Pro 
20 25 30 

Glu Leu Ala Glu Val Val Ala Gin Asp Leu Leu Ala Pro Ser Glu Met 
35 40 45 

Asp Val Gly Asp Tyr Val He Arg Val Lys Glu Gly Pro Ser Glv Phe 
50 55 60 

Pro Cys Thr Ser Gin Val Asn Ser He Asn His Trp He He Thr Leu 
65 70 75 80 

Cys Ala Leu Ser Glu Ala Thr Gly Leu Ser Pro Asp Val Val Gin Ser 
85 90 95 

Met Ser Tyr Phe Ser Phe Tyr Gly Asp Asp Glu He Val Ser Thr Asp 
100 105 no 

He Asp Phe Asp Pro Ala Arg Leu Thr Gin He Leu Lys Glu 
115 120 ■ 125 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 121 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: hepatitis E virus 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Val Phe Glu Asn Asp Phe Ser Glu Phe Asp Ser Thr Gin Asn Asn Phe 
1 5 10 15 

Ser Leu Gly Leu Glu Cys Ala lie Met Glu Glu Cys Gly Met Pro Gin 
20 25 30 

Trp Leu lie Arg Leu Tyr His Leu lie Arg Ser Ala Trp He Leu Gin 
35 40 45 

Ala Pro Lys Glu Ser Leu Arg Gly Phe Trp Lys Lys His Ser Lye His 
50 55 60 

Ser Gly Glu Pro Gly Thr Leu Leu Trp Asn Thr Val Trp Asn Met Ala 
65 70 75 80 

Val He Thr His Cys Tyr Asp Phe Arg Asp Phe Gin Val Ala Ala Phe 
85 90 95 

Lys Gly Asp Asp Ser He Val Leu Cys Ser Glu Tyr Arg Gin Ser Pro 
100 105 HO 

Gly Ala Ala Val Leu He Ala Gly Cys 
115 120 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 127 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: hepatitis C virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val Thr Glu Ser 
15 10 is 

Asp lie Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp Leu Asp Pro 
20 25 30 

Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu Tyr Val Glv 
35 40 45 

Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr Arg Arg Cys 
50 55 60 

Arg Ala Ser Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
65 70 75 80 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
85 90 95 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
100 105 no 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe 
115 120 125 
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(2) INFORMATION FOR SEQ ID NO:41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 132 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: hepatitis A virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

Gly Leu Asp Leu Asp Phe Ser Ala Phe Asp Ala Ser Leu Ser Pro Phe 
1 5 10 15 

Met He Arg Glu Ala Gly Arg He Met Ser Glu Leu Ser Gly Thr Pro 
20 25 30 

Ser His Phe Gly Thr Ala Leu He Asn Thr He He Tyr Ser Lys His 
35 40 45 

Leu Leu Tyr Asn Cys Cys Tyr His Val Cys Gly Ser Met Pro Ser Gly 
50 55 60 

Ser Pro Cys Thr Ala Leu Leu Asn Ser He He Asn Asn Val Asn Leu 
65 70 75 80 

Tyr Tyr Val Phe Ser Lys He Phe Gly Lys Ser Pro Val Phe Phe Cys 
85 90 95 

Gin Ala Leu Lys He Leu Cys Tyr Gly Asp Asp Val Leu He Val Phe 
100 105 no 

Ser Arg Asp Val Gin He Asp Asn Leu Asp Leu He Gly Gin Lys He 
115 120 125 

Val Asp Glu Phe 
130 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 158 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Japanese encephalitis virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

Met Tyr Ala Asp Asp Thr Ala Gly Trp Asp Thr Arg He Thr Arg Thr 
1 5 io 15 * 

Asp Leu Glu Asn Glu Ala Lys Val Leu Glu Leu Leu Asp Gly Glu His 
20 25 30 

Arg Met Leu Ala Arg Ala He He Glu Leu Thr Tyr Arg His Lys Val 
35 40 45 
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Val Lys Val Met Arg Pro Ala Ala Glu Gly Lys Thr Val Met Asp Val 
50 55 60 

lie Ser Arg Glu Asp Gin Arg Gly Ser Gly Gin Val Val Thr Tyr Ala 
65 70 75 80 

Leu Asn Thr Phe Thr Asn He Ala Val Gin Leu Val Arg Leu Met Glu 
85 90 95 

Ala Glu Gly Val He Gly Pro Gin His Leu Glu Gin Leu Pro Arg Lys 
100 105 no 

Thr Lys He Ala Val Arg Thr Trp Leu Phe Glu Asn Gly Glu Glu Ara 
115 120 125 

Val Thr Arg Met Ala He Ser Gly Asp Asp Cys Val Val Lys Pro Leu 
130 135 140 

Asp Asp Arg Phe Ala Thr Ala Leu His Phe Leu Asn Ala Met 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 117 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Poliovirus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

Phe Ala Phe Asp Tyr Thr Gly Tyr Asp Ala Ser Leu Ser Pro Ala Trp 
1 5 10 15 

Phe Glu Ala Leu Lys Met Val Leu Glu Lys He Gly Phe Gly Asp Arg 
20 25 30 

Val Asp Tyr He Asp Tyr Leu Asn His Ser His His Leu Tyr Lys Asn 
35 40 45 

Lys Thr Tyr Cys Val Lys Gly Gly Met Pro Ser Gly Cys Ser Gly Thr 
50 55 60 

Ser He Phe Asn Ser Met He Asn Asn Leu He He Arg Thr Leu Leu 
65 70 75 80 

Leu Lys Thr Tyr Lys Gly He Asp Leu Asp His Leu. Lys Met He Ala 
85 90 95 

Tyr Gly Asp Asp Val He Ala Ser Tyr Pro His Glu Val Asp Ala Ser 
100 105 no 

Leu Leu Ala Gin Ser 
115 

(2) INFORMATION FOR SEQ ID NOs44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 121 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Foot-and-mouth disease virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 

Val Trp Asp Val Asp Tyr Ser Ala Phe Asp Ala Asn His Cys Ser Asp 
15 10 15 

Ala Met Asn lie Met Phe Glu Glu Val Phe Arg Thr Asp Phe Gly Phe 
20 25 30 

His Pro Asn Ala Glu Trp lie Leu Lys Thr Leu Val Asn Thr Glu His 
35 40 45 

Ala Tyr Glu Asn Lys Arg lie Thr Val Glu Gly Gly Met Pro Ser Gly 

50 .55 60 

Cys Ser Ala Thr Ser He He Asn Thr He Leu Asn Asn He Tyr Val 
65 70 75 80 

Leu Tyr Ala Leu Arg Arg His Tyr Glu Gly Val Glu Leu Asp Thr Tyr 
85 90 95 

Thr Met He Ser Tyr Gly Asp Asp He Val Val Ala Ser Asp Tyr Asp 
100 105 110 

Leu Asp Phe Glu Ala Leu Lys Pro His 
115 120 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 126 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISMS encephalomyocarditis virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

Val Tyr Asp Val Asp Tyr Ser Asn Phe Asp Ser Thr His Ser Val Ala 
15 10 15 

Met Phe Arg Leu Leu Ala Glu Glu Phe Phe Thr Pro Glu Asn Gly Phe 
20 25 30 

Asp Pro Leu Thr Arg Glu Tyr Leu Glu Ser Leu Ala He Ser Thr His 
35 40 45 

Ala Phe Glu Glu Lys Arg Phe Leu lie Thr Gly Gly Leu Pro Ser Gly 
50 55 60 

Cys Ala Ala Thr Ser Met Leu Asn Thr He Met Asn Asn He He He 
65 70 75 80 
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Arg Ala Gly Leu Tyr Leu Thr Tyr Lys Asn Phe Glu Phe Asp Abo Val 
85 90 95 

Lys Val Leu Ser Tyr Gly Asp Asp Leu Leu Val Ala Thr Asn Tyr Gin 
100 105 110 

Leu Asp Phe Asp Lys Val Arg Ala Ser Leu Ala Lys Thr Gly 
115 120 125 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 122 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Sindbis virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Val Leu Glu Thr Asp He Ala Ser Phe Asp Lys Ser Gin Asp Asp Ala 
1 5 10 15 

Met Ala Leu Thr Gly Leu Met He Leu Glu Asp Leu Gly Val Asp Gin 
20 25 30 

Pro Leu Leu Asp Leu He Glu Cys Ala Phe Gly Glu He Ser Ser Thr 
35 40 45 

His Leu Pro Thr Gly Thr Arg Phe Lys Phe Gly Ala Met Met Lys Ser 
50 55 60 

Gly Met Phe Leu Thr Leu Phe Val Asn Thr Val Leu Asn Val Val He 
65 70 75 80 

Ala Ser Arg Val Leu Glu Glu Arg Leu Lys Thr Ser Arg Cys Ala Ala 
85 90 95 

Phe He Gly Asp Asp Asn He He His Gly Val Val Ser Asp Lys Glu 
100 105 no 

Met Ala Glu Arg Cys Ala Thr Trp Leu Asn 
115 120 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 124 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: tobacco mosaic virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

Val Leu Glu Leu Asp He Ser Lys Tyr Asp Lys Ser Gin Asn Glu Phe 
15 10 15 
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His Cys Ala Val Glu Tyr Glu lie Trp Arg Arg Leu Gly Phe Glu Asp 
20 25 30 

Phe Leu Gly Glu Val Trp Lys Gin Gly His Arg Lys Thr Thr Leu Lys 
35 40 45 

Asp lie Thr Ala Gly Tyr Lys Thr Cys lie Trp Tyr Gin Arg Lys Ser 
50 55 60 

Gly Asp Val Thr Thr Phe lie Gly Asn Thr Val lie He Ala Ala Cys 
65 70 75 80 

Leu Ala Ser Met Leu Pro Met Glu Lys He He Lys Gly Ala Phe Cys 
85 90 95 

Gly Asp Asp Ser Leu Leu Tyr Phe Pro Lys Gly Cys Glu Phe Pro Asp 
100 105 HO 

Val Gin His Ser Ala Asn Leu Met Trp Asn Phe Glu 
115 120 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 125 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: alfalfa mosaic virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Phe Lys Glu He Asp Phe Ser Lys Phe Asp Lys Ser Gin Asn Glu Leu 
15 10 15 

His His Leu He Gin Glu Arg Phe Leu Lys Tyr Leu Gly He Pro Asn 
20 25 30 

Glu Phe Leu Thr Leu Trp Phe Asn Ala His Arg Lys Ser Arq He Ser 
35 40 45 

Asp Ser Lys Asn Gly Val Phe Phe Asn Val Asp Phe Gin Arg Arg Thr 
50 55 60 

Gly Asp Ala Leu Thr Tyr Leu Gly Asn Thr lie Val Thr Leu Ala Cys 
65 70 75 80 

Leu Cys His Val Tyr Asp Leu Met Asp Pro Asn Val Lys Phe Val Val 
85 90 95 

Ala Ser Gly Asp Asp Ser Leu He Gly Thr Val Glu Glu Leu Pro Arg 
100 105 HO 

Asp Gin Glu Phe Leu Phe Thr Thr Leu Phe Asn Leu Glu 
115 120 125 



(2) INFORMATION FOR SEQ ID NO: 49: 
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<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 122 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: broroe mosaic virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Phe Leu Glu Ala Asp Leu Ser Lys Phe Asp Lys Ser Gin Gly Glu Leu 
15 10 15 

His Leu Glu Phe Gin Arg Glu He Leu Leu Ala Leu Gly Phe Pro Ala 
20 25 30 

Pro Leu Thr Asn Trp Trp Ser Asp Phe His Arg Asp Ser Tyr Leu Ser 
35 40 45 

Asp Pro His Ala Lys Val Gly Met Ser Val Ser Phe Gin Arg Arg Thr 
50 55 60 

Gly Asp Ala Phe Thr Tyr Phe Gly Asn Thr Leu Val Thr Met Ala Met 
65 70 75 80 

He Ala Tyr Ala Ser Asp Leu Ser Asp Cys Asp Cys Ala He phe Ser 
85 90 95 

Gly Asp Asp Ser Leu He He Ser Lys Val Lys Pro Val Leu Asp Thr 
100 105 HO 

Asp Met Phe Thr Ser Leu Phe Asn Met Glu 
115 120 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: cowpea mosaic virus 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

Val Leu Cys Cys Asp Tyr Ser Ser Phe Asp Gly Leu Leu Ser Lvs Gin 
15 10 15 

Val Met Asp Val He Ala Ser Met He Asn Glu Leu Cys Gly Gly Glu 
20 25 30 

Asp Gin Leu Lys Asn Ala Arg Arg Asn Leu Leu Met Ala Cys Cys Ser 
35 40 45 

Arg Leu Ala He Cys Lys Asn Thr Val Trp Arg Val Glu Cys Gly He 
50 55 60 
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65 



Gly Phe Pro Met Thr Val He Val Asn Ser He Phe Asn Glu 
70 75 80 



He Leu 



He Arg Tyr His Tyr Lys Lys Leu Met Arg Glu Gin Gin Ala 
85 90 95 



Pro Glu 



Leu Met Val Gin Ser Phe Asp Lys Leu He Gly Leu Val Thr 
100 105 HO 



« 



Tyr Gly 



Asp Asp Asn Leu He Ser Val Asn Ala Val Val Thr Pro Tyr 
115 120 125 



Phe Asp 
130 



Gly Lys Lys Leu Lys Gin Ser Leu Ala Gin Gly Gly 
135 140 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
CACGCGGAGG CTCTCAAT 18 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE-. CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
GGTGGCGAAG CGGCCCTC 18 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 



TCAGCAGTTA TAGATATG 



18 



(2) INFORMATION FOR SEQ ID NO: 54: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
ATGCTATATA CATAGGTC 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
CAACAGGTAC TACGTGAC 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
TGTGGCCCAA GATTTGCT 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
ATAAAAGTTG GCATGAACAC AAAT 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
GTTGCTGTTG GCATTAACAT GGAC 



(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : double 

(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GTTCCTGTTG GCATAAACAT GGAC 



(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
GTTCCGGTTG GCATTAACAT GGAC 



(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GTTCCGGTTG GTATCAACAT GGAC 



(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
GTTGCGGTTG GTGTTGACAT GAGA 
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(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/CDC 6/91 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

ATGCACTTCA CAGGTGAATA GCATCAACCA CTGGATCCTA ACTCTATGTG CATTGTCAGA 60 

AGTCACTGGC TTGTCCCCTG ATGTGATACA ATCACAATCT TATTTCTCAT TTTATGGT 118 



(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/UT/88 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

ATGTACCTCA CAAGTGAACA GCATCAATCA CTGGATTTTG ACCTTGTGGG GCCTATCAGA 60 

AGTTACTGGT CTGGCTCCTG ATGTAATACA GTCACAATCT TACTTTTCAT TCTATGGT 118 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 117 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Snow Mountain Agent/78 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

CTGCACATCA CAGTGGAATT CCATGCCCAC TGGCTCCTCA CACTCTGTGC ACTATCTGAA 60 

GTCACAAACC TGGCTCCTGA CATCATACAA GCTAACTCCT TGTTCTCTTT CTATGGT 117 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/ CAMBRIDGE , UK 92 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

CTGCACCTCA CAGTGGAACT CCATTGCCCA CTGGTTGCTT ACTCTGTGTG CCCTTTCTGA 60 

AGTGACAGGA CTAGGCCCCG ACATCATACA AGCTAATTCC ATGTACTCTT TCTATGGT 118 

(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/ CDC 32 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: 

TTGCACCTCA CAGTGGAACT CCATTGCCCT CTGGTTGCTT ACTCTGTGTG CCCTTTCTGA 60 

AGTGACAGGA CTAGGCCCCG ACATCATACA AGCTAATTCC ATGTACTCTT TCTATGGT 118 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Norwalk virus/8FIIa/68 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

ATGTACTTCC CAGGTGAACA GCATAAATCA CTGGATAATT ACTCTGTGTG CACTGTCTGA 60 

GGCCACTGGT TTATCACCTG ATGTGGTGCA ATCCATGTCA TATTTCTCAT TTTATGGT 118 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE : 

(A) ORGANISM* SRSV-3/88 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

CTGCACTTCT CAAGTAAATA GCATAAATCA CTGGATAATC ACCCTTTGTG CACTGTCTGA 60 

GGCTACTGGC TTATCACCTG ATGTGGTGCA GTCCATGTCA TACTTCTCAT TTTACGGT 118 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : double 

(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/KY89/89 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

CTGCACTTCT CAAGTGAATS GCATAAATCA CTGGATAATC ACCCTTTGTG CATTGTCTGA 60 

GGCTACTGGC TTATCACCTG ATGTGGTACA GTCCATGTCA TACTTCTCAT TCTACGGT 118 

(2) INFORMATION FOR SEQ ID NO: 71: 

( i ) SEQUENCE CHARACTERI STICS : 

(A) LENGTH: 279 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Norwalk Virus/8FIIa/68 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 

CAATAGAAGA TGGCCCCCTC ATCTATGCTG AGCATGCTAA ATATAAGAAT CATTTTGATG 60 

CAGATTATAC AGCATGGGAC TCAACACAAA ATAGACAAAT TATGACAGAA TCCTTCTCCA 120 

TTATGTCGCG CCTTACGGCC TCACCAGAAT TGGCCGAGGT TGTGGCCCAA GATTTGCTAG 180 

CACCATCTGA GATGGATGTA GGTGATTATG TCATCAGGGT CAAAGAGGGG CTGCCATCTG 240 

GATTCCCATG TACTTCCCAG GTGAACAGCA TAAATCACT 279 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE" CHARACTERISTICS: 

(A) LENGTH: 279 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown. 

(ii) MOLECULE TYPE: DNA (genomic) 



WO 94/05700 



96 



PCI7US93/08447 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV-3/88 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 

CAATAGAGGA TGGCCCTTTA ATTTATGCTG AGCATGCCAA GTACAAAAAT CATTTTGATG 60 

CAGATTACAC AGCATGGGAC TCTACACAAA ATAGACAAAT AATGACAGAA TCCTTTTCCA 120 

TCATGTCACG CCTCACGGCC TCTCCAGAAC TAGCTGAGGT TGTAGCCCAG GACTTGCTAG 180 

CACCATCCGA GATGGATGTG GGTGACTATG TTATAAGGGT CAAAGAAGGC CTACCATCAG 240 

GATTTCCCTG CACTTCTCAA GTAAATAGCA TAAATCACT 279 

(2) INFORMATION FOR SEQ ID NO; 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 279 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV/KY89 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 

CAATAGAGGA TGGCCCTTTA ATTTATGCTG AACATGCCAA GTACAAAAAT CATTTTGATG 60 

CAGATTACAC AGCATGGGAC TCTACACAAA ATAGACAAAT TATGACAGAA TCCTTCTCCA 120 

TCATGTCACG CCTTACGGCC TCTCCAGAAC TAGCTGAGGT TGTAGCCCAG GACTTACTAG 180 

CACCATCCGA GATGGATGTG GGCGACTATG TTATAAGGGT CAAAGAAGGC CTACCATCAG 240 

GATTTCCCTG CACTTCTCAA GTGAATAGCA TAAATCACT 279 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 279 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: SRSV / Cambridge , UK/92 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

TGTATGAAGA TGGTACCATA ATATTTGAGA AACATTCCAG ATACAGATAC CACTATGATG 60 

CAGATTATCC CGCTGGGTAC TCCACGCAGC AACGGGCAGT GTTGGCAGCA GCACTTGAAA 120 

TCATGGTGAG GTTCTCTGCT GAACCACAGC TAGCGCAAAT AGTAGCTGAA GATCTGCTAG 180 

CACCAAGTGT AGTTGATGTG GGTGACTTCA AGATCACCAT TAATGAAGGC CTACCTTCTG 240 
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GTGTGCCCTG CACCTCACAG TGGAACTCCA TTGCCCACT 279 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base paire 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Snow Mountain Agent /78 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

GAATGAGGAT GGACCCATAA TTTTTGAAAA GCACTCCAGG TTCTCATACC ACTATGATGC 60 

AGATTACTCA CGCTGGGACT CAACCCAACA GAGGGCAGTG CTAGCTGCAG CCTTGGAAAT 120 

CATGGTAAAA TTCTCACCAG AACCACATTT GGCCCAAATT GTTGCAGAGG ATCTCCTAGC 180 

CCCCAGTGTG ATGGATGTAG GTGATTTCAA AATAACAATT AATGAGGGAC TGCCCTCGGG 240 

AGTACCCTGC ACATCACAGT GGAATTCCAT GCCCACT 277 
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CLAIMS 

1. A cDNA sequence of the formula shown in Table 2 and 
fragments and derivatives thereof having sufficient size to bind a Norwalk 
or Norwalk-related virus genome. 

5 2. A protein encoded by nucleotides including nucleotides 1 

through 7753 of the Norwalk virus genome shown in Table 2 or fragments 
or derivatives thereof. 

3. The protein of claim 2, wherein said protein is produced in 
a prokaryotic expression system or a eukaryotic expression system. 

10 4. The protein of claim 2, wherein said protein is produced by 

chemical methods. 

5. A protein encoded by nucleotides 146 through 5359 of the 
Norwalk virus genome shown in Table 2 or fragments or derivatives 
thereof. 

15 6. The protein of claim 5, wherein said protein is produced in 

a prokaryotic expression system or eukaryotic expression system. 

7. The protein of claim 5, wherein said protein is produced by 
chemical methods. 

8. A RNA-dependent RNA polymerase encoded by nucleotides 
20 4543 to 4924 of the Norwalk virus genome shown in Table 2 or fragments, 

9. The RNA polymerase of claim 8, wherein said RNA 
polymerase is produced in a prokaryotic expression system or a eukaryotic 
expression system. 
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10. The RNA polymerase of claim 8, wherein said RNA 
polymerase is produced by chemical methods. 

11. A protein encoded by nucleotides 5337 through 7573 of the 
Norwalk virus genome shown in Table 2 or fragments or derivatives 

5 thereof. 

12. The protein of claim 11, wherein said protein is produced in 
a prokaryotic expression system or eukaryotic expression system. 

13. The protein of claim 11, wherein said protein is produced by 
chemical methods. 

10 14. A protein encoded by nucleotides 5346 through 6935 of the 

Norwalk virus genome shown in Table 2 or fragments or derivatives 
thereof. 

15. The protein of claim 14, wherein said protein is produced in 
a prokaryotic expression system or eukaryotic expression system. 

15 16. The protein of claim 14, wherein said protein is produced by 

chemical methods. 

17. A protein encoded by nucleotides 6938 through 7573 of the 
Norwalk virus genome shown in Table 2 or fragments or derivatives 
thereof. 

20 18. The protein of claim 17, wherein said protein is produced in 

a prokaryotic expression system or eukaryotic expression system. 

19. The protein of claim 17, wherein said protein is produced by 
chemical methods. 
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20. A method of making a RNA probe to detect Norwalk or 
Norwalk-related viruses, comprising the steps of: 

subcloning a Norwalk virus cDNA clone into a transcription 

vector; 

5 growing said cDNA containing transcription vector; 

adding RNA polymerase to generate single stranded RNA by 
in vitro transcription; and 

isolating said single stranded RNA. 

21. A method of identifying Norwalk or Norwalk-related viruses 
10 in a sample suspected of containing Norwalk or Norwalk-related viruses, 

comprising the steps of: 

adding a cDNA or a RNA probe specific to Norwalk virus or 
a Norwalk-related virus to said sample to be tested under 
conditions in which the cDNA or RNA probe will bind to the 
15 Norwalk or Norwalk-related virus genome; and 

measuring the amount of binding of said cDNA or RNA 

probe. 

22. The method of claim 21, wherein said sample is selected from 
the group consisting of food, water and stool. 

20 23. The method of claim 21, wherein said cDNA is selected from 

a group consisting of pUCNV-953, pUCNV-4145, pUCNV-4095, pUCNV- 
5030 and pUCNV-5101 or fragments or derivatives thereof. 

24. A method of identifying Norwalk or Norwalk-related viruses 
in a sample suspected of containing Norwalk or Norwalk-related viruses 
25 comprising the steps of: 

adding at least two oligonucleotides each of about 10 
nucleotides or greater to said sample under conditions in which 
said oligonucleotides bind to the Norwalk or Norwalk-related virus 
genome; 
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amplifying a nucleotide sequence between said bound 
oligonucleotides; and 

measuring the amount of amplified sequence. 

25. A method of identifying Norwalk or Norwalk-related viruses 
5 in a sample suspected of containing Norwalk or Norwalk-related viruses 

comprising the steps of: 

isolating said nucleic acids using CTAB procedure; 
amplifying nucleic acid; and 
measuring the amplified product. 

26. The method of claim 25, wherein the CTAB procedure 
includes: 

extracting said sample with genetron; 
removing the supernatant of said genetron extracted 
sampled; 

precipitating viruses in said supernatant with polyethylene 

glycol; 

treating said precipitate with proteinase K in the presence 
of SDS at about 30° minutes; 

sequentially extracting said treated precipitate with phenol- 
chloroform and then chloroform; 

forming a mixture by adding a solution of about 5% CTAB 
and about 0.4M NaCl to said supernatant of said sequentially 
extracted sample at a ratio of about 5:2 samplerCTAB; 
incubating said mixture; 

centrifuging said mixture to collect nucleic acids; 
suspending said nucleic acids in 1M NaCL and thereafter 
extracting with chloroform. 

27. A method of claim 25 further comprising: 
performing reverse transcription on said nucleic acids; 

30 amplifying nucleic acids using primers; and 
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detecting the amplified nucleic acids using agarose gel 
electrophoresis. 

28. A method of cloning Norwalk or pathogens from food, 
biological and environmental samples, comprising: 

5 isolating said nucleic acids using CTAB procedure; 

amplifying nucleic acids; and 

incorporating said amplified nucleic acids into vectors. 

29. A primer sequence of the formula CTT GTT GGT TTG AGG 
CCA TAT. 

10 30. A primer sequence of the formula ATA AAA GTT GGCATG 
AACA. 

31 . A primer sequence of the formula GTT GAC ACA ATC TCA 
TCA TC. 

32. A primer sequence of the formula GGC CTG CCA TCT GGA 
15 TTG CC. 

33. A primer sequence of the formula GGG CCC CCT GGT ATA 
GGT AA. 

34. A primer sequence of the formula TGG TGA TGA CTA TAG 
CAT CAG ACA CAA A. 

20 35. A primer sequence of the formula ACT CAC CCA AAT CCT 

CCA. 

36. A primer sequence of the formula GTT CTG ACC ACC TAA 

CCT. 
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37. A primer sequence of the formula AGT TTG GGT CCC CAT 
CTT AAT CCT TT. 



38. A primer sequence of the formula TGA ACC AAA ACC AGG 

GGG. 

5 39. A primer sequence of the formula AGC AAA GTC ATA CAT 
GAAAT. 

40. A primer sequence of the formula CCA TTA TAC ATT TGT 

AG. 

41. A primer sequence of the formula ATT ATA GTT TCT TGC 

10 ATA. 

42. A primer sequence of the formula CAC ACT CTG GAC ATT 
GTC TG. 

43. A primer sequence of the formula CAT TGG GTT TCC AGA 
CCT A. 

15 44. A primer sequence of the formula ATA ATT GGG GAT CTT 
CCA AA. 

45. A primer sequence of the formula TAG TGG CAT GGG TAT 

TTC. 

46. A primer sequence of the formula TAT GCC AAT CAC AGC 

20 CAC. 

47. A primer sequence of the formula GTC TGG CTC CCA AGT 
TGA CC. 
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48. A primer sequence of the formula CGG TAT GAG GGT CAA 

CAT. 

49. A primer sequence of the formula TGA GGC TGC CCT GCT 

CCA. 

5 50. A primer sequence of the formula CCA CCG CTG TCC GGG 

AGG. 

51. A primer sequence of the formula GTT GCT GTT GGC ATT 
AACA. 

52. A method of making a probe to detect Norwalk or Norwalk- 
10 related viruses, comprising the steps of: 

synthesizing one or more short or long nucleotides from the 
Norwalk virus genome shown in Table 2 or fragments or 
derivatives thereof. 



53. The probe produced by the method of claim 52. 



15 54. A method of making a probe to detect Norwalk or Norwalk- 

related viruses, comprising the step of: 

synthesizing one or more short or long nucleotides from a 
subgenomic region of the Norwalk virus genome shown in Table 2 
or fragments or derivatives thereof. 

20 55. The probe produced by the method of claim 54. 

56. The probe of claim 55, wherein said subgenomic region 
includes a sequence of the formula CTT GTT GGT TTG AGG CCA TAT. 
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57. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula ATA AAA GTT GGC ATG 
AAC A. 

58. The probe of claim 55, wherein said subgenomic region 
5 includes a nucleotide sequence of the formula GTT GAC ACA ATC TCA 

TCA TC. 

59. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula GGC CTG CCA TCT GGA 
TTG CC. 

10 60. The probe of claim 55, wherein said subgenomic region 

includes a nucleotide sequence of the formula GGG CCC CCT GGT ATA 
GGT AA. 

61. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula TGG TGA TGA CTA TAG 

15 CATCAG ACACAAA. 

62. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula GTT CTG ACC ACC TAA 
CCT. 

63. The probe of claim 55, wherein said subgenomic region 
20 includes a nucleotide sequence of the formula AGT TTG GGT CCC CAT 

CTT AAT CCT TT. 

64. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula TGA ACC AAA ACC AGG 
GGG. 
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65. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula AGC AAA GTC ATA CAT 
GAAAT. 

66. The probe of claim 55, wherein said subgenomic region 
5 includes a nucleotide sequence of the formula CCA TTA TAC ATT TGT 

AG. 

67. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula CAC ACT CTG GAC ATT 
GTC TG. 

10 68. The probe of claim 55, wherein said subgenomic region 

includes a nucleotide sequence of the formula CAT TGG GTT TCC AGA 
CCT A. 

69. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula ATA ATT GGG GAT CTT 

15 CCA AA. 

70. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula TAT GCC AAT CAC AGC 
CAC. 

71. The probe of claim 55, wherein said subgenomic region 
20 includes a nucleotide sequence of the formula GTC TGG CTC CCA AGT 

TGA CC. 



72. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula CGG TAT CAG GGT CAA 
CAT. 
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73. The probe of claim 55, wherein said subgenomic region 
includes a nucleotide sequence of the formula TGA GGC TGC CCT GCT 
CCA. 

74. The probe of claim 55, wherein said subgenomic region 
5 includes a nucleotide sequence of the formula CCA CCG CTG TCC GGG 

AGG, 

75. The method of claim 54, wherein said subgenomic region 
includes said Norwalk genome's first open reading frame. 

76. The probe produced by the method of claim 75. 

10 77. The method of claim 54, wherein said subgenomic region 

includes nucleotides 146 through 5359. 

78. The probe produced by the method of claim 77. 

79. The method of claim 54, wherein said nucleotides code for a 
picornavirus 2C-like protein, a 3C-like protease, an RNA-dependent RNA 

15 polymerase or any combination thereof. 

80. The probe produced by the method of claim 79. 

81. The method of claim 54, wherein said nucleotide codes for a 
capsid protein. 

82. The probe produced by the method of claim 81. 

20 83. The method of claim 54, wherein said subgenomic region 

includes nucleotides 5337 through 7573. 

84. The probe produced by the method of claim 83. 
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85. The method of claim 54, wherein said subgenomic region 
includes nucleotides 5346 through 6935. 

86. The probe produced by the method of claim 85. 

87. The method of claim 54, wherein said subgenomic region 
5 includes nucleotides 6938 through 7573. 

88. The probe produced by the method of claim 87. 

89. A method of making a probe to detect Norwalk-related 
viruses, comprising the steps of: 

selecting one or more nucleotide sequences from the group 
10 consisting of GTTGCTGTTGGCATTAACA, 

TAGTGGCATGGGTATTTC, ATTATAGTTTCTTGCATA, 
AGCAAAGTCATACATGAAAT, and ACTCACCCAAATCCTCCA; 

producing said nucleotide sequence by chemical methods or 
in an expression system. 



15 90. The probe produced by the method of claim 89. 



91. A kit for detecting an immune response to Norwalk virus, 
comprising: 

a container including a protein encoded by the Norwalk virus 
genome shown in Table 2 or fragments or derivatives thereof. 

20 92. The kit of claim 91, wherein said protein is selected from the 

group consisting of the protein encoded by nucleotides 1 through 7753, the 
protein encoded by nucleotides 146 through 5359, the protein encoded by 
nucleotides 5337 through 7573, the protein encoded by nucleotides 5346 
through 6935, the protein encoded by nucleotides 6938 through 7573 and 

25 any combination thereof. 
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93. A kit for detecting an immune response to a Norwalk-related 
virus, comprising: 

a container including a protein encoded by the genome for 
said Norwalk-related virus. 

5 94. A method of detecting an immune response to Norwalk virus, 

comprising the steps of: 

collecting a serum sample from an individual suspected of 
having been exposed to Norwalk virus; 

selecting a protein encoded by the Norwalk virus genome 
10 shown in Table 2 or fragments or derivatives thereof; 

adding said selected protein to said serum in a diagnostic 
assay under conditions allowing said selected protein and the serum 
to react; and 

measuring the amount of reaction of said serum and said 
15 selected protein. 

95. The method of claim 94, wherein said diagnostic assay is 
selected from the group consisting of enzyme-linked immunosorbent 
assays, radioimmunoassays and immunoblots. 

96. The method of claim 94, wherein said selected protein is a 
20 capsid protein. 

97. The method of claim 94, wherein said selected protein has 
the intrinsic property of being able to form particle(s). 

98. The method of claim 94, wherein said selected protein is 
selected from the group consisting of the protein encoded by nucleotides 

25 1 through 7753, the protein encoded by nucleotides 146 through 5359, the 
protein encoded by nucleotides 5337 through 7573, the protein encoded by 



WO 94/05700 



110 



PCIYUS93/08447 



nucleotides 5346 through 6935, the protein encoded by nucleotides 6938 
through 7573 and any combination thereof. 

99. A diagnostic assay to detect an immune response to Norwalk 
virus, comprising: 

5 selecting a protein encoded in Norwalk virus genome shown 

in Table 2 or fragments or derivatives thereof; 
using said protein as an antigen; 

adding post-infection serum from a Norwalk infected 
individual under conditions allowing said serum to react with said 
10 antigen; and 

measuring the amount of reaction of said serum and said 
antigen. 

100. The method of claim 99, wherein said protein is a capsid 
protein. 

15 101. The method of claim 99, wherein said protein has the 

intrinsic property of being able to form particle(s). 

102. The method of claim 99, selected from the group consisting 
of the protein encoded by nucleotides 1 through 7753, the protein encoded 
by nucleotides 146 through 5359, the protein encoded by nucleotides 5337 

20 through 7573, the protein encoded by nucleotides 5346 through 6935, the 
protein encoded by nucleotides 6938 through 7573 and any combination 
thereof. 

103. A kit for detecting Norwalk viruses and Norwalk-related 
viruses, comprising: 

25 a container including at least one antiserum made from a 

protein encoded by the Norwalk virus genome shown in Table 2 or 
from a fragment or derivative of said genome. 
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104. The kit of claim 102, wherein said protein is selected from 
the group consisting of the protein encoded by nucleotides 1 through 7753, 
the protein encoded by nucleotides 146 through 5359, the protein encoded 
by nucleotides 5337 through 7573, the protein encoded by nucleotides 

5 5346 through 6935, the protein encoded by nucleotides 6938 through 7573 
and any combination thereof. 

105. A method of producing antibodies to Norwalk and Norwalk- 
related viruses, comprising: 

immunizing animals with a protein encoded by the Norwalk 
10 virus genome shown in Table 2 or fragments or derivatives thereof. 

106. The method of claim 105, wherein said protein is selected 
from the group consisting of the protein encoded by nucleotides 1 through 
7753, the protein encoded by nucleotides 146 through 5359, the protein 
encoded by nucleotides 5337 through 7573, the protein encoded by 

15 nucleotides 5346 through 6935, the protein encoded by nucleotides 6938 
through 7573 and any combination thereof. 

107. A vaccine for Norwalk virus, comprising: 

a Norwalk virus antigen encoded by the cDNA sequence of 
Norwalk virus shown in Table 2 or fragments or derivatives 
20 thereof. 

108. The vaccine of claim 107, wherein said antigen is produced 
using nucleotides 146 through 5359 of the Norwalk virus genome shown 
in Table 2 or a derivative thereof. 

109. The vaccine of claim 107, wherein said antigen is produced 
25 using nucleotides 5337 through 7573 of the Norwalk virus genome shown 

in Table 2 or a derivative thereof. 
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110. The vaccine of claim 107, wherein said antigen is produced 
using nucleotides 5346 through 6935 of the Norwalk virus genome shown 
in Table 2 or a derivative thereof. 

111. The vaccine of claim 107, wherein said antigen is produced 
5 using nucleotides 6938 through 7573 of the Norwalk virus genome shown 

in Table 2 or a derivative thereof. 



112. The vaccine of claim 107, wherein said antigen has the 
intrinsic property of being able to form particle(s). 

113. A method of immunizing an individual against Norwalk 
10 virus, comprising the step of: 

orally or parenterally administering an immunologically 
effective dose(s) of the vaccine of claim 107. 

114. A method of immunizing an individual against Norwalk 
virus, comprising the steps of: 

15 orallv and Parenteral^ administering an immunologically 

effective dose of the vaccine of claim 107. 



115. A cDNA sequence of the human calicivirus Sopporo genome 
shown in Figure 9 and fragments and derivatives thereof, said fragments 
and derivatives having sufficient size and nucleotide homology to bind a 

20 Norwalk or Norwalk-related virus genome. 

116. A protein encoded by nucleotides including nucleotides 1 
through 551 of the human calicivirus Sopporo genome shown in Figure 9 
or fragments or derivatives thereof. 

117. A cDNA subclone of the human calicivirus Sopporo genome 
25 comprising nucleotides 1 through 149 and fragments and derivatives 
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thereof, said fragments and derivatives having sufficient size and 
nucleotide homology to bind a Norwalk or Norwalk-related virus genome. 

1 18. A cDNA subclone of the human calcicivirus Sopporo genome 
comprising nucleotides 113 through 551 and fragments and derivatives 

5 thereof, said fragments and derivatives having sufficient size and 
nucleotide homology to bind a Norwalk or Norwalk-related virus genome. 

119. A cDNA sequence of the Day care calicivirus genome shown 
in Figure 9 and fragments and derivatives thereof, said fragments and 
derivatives having sufficient size and nucleotide homology to bind a 

10 Norwalk or Norwalk-related virus genome. 

120. A cDNA sequence of the SRSV/KY/89 genome shown in 
Figure 12 and fragments and derivatives thereof, said fragments and 
derivatives having sufficient size and nucleotide homology to bind a 
Norwalk or Norwalk-related virus genome. 

15 121. A cDNA sequence of the human calicivirus Houston shown 

in Table 10 and fragments and derivatives thereof, said fragments and 
derivatives having sufficient size and nucleotide homology to bind a 
Norwalk or Norwalk-related virus genome. 

122. A cDNA subclone of a primate calicivirus comprising the 
20 sequence TGGACGGACC TGCTGTTGAA GATCTCTTCA 

AANGGCTCGA ACGACCAAAG CACGATCGGT ATTGTGTTGA 
CTACGCAAAG TGGGACTCAA CCCANCCACCA AAAGTAACAT 
CCAATCAATN GACATC and fragments and derivatives thereof, said 
fragments and derivatives having sufficient size and nucleotide homology 
25 to bind a Norwalk or Norwalk-related virus genome. 

123. A cDNA subclone of a primate calicivirus comprising the 
sequence GTGANATGNN ACATCTTCGA CTCGATGGAC CTATTCACAT 
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ATGGTGATGA CGGTGTCTAC ATCGTCCCAC CACTATATCA 
TCTGTCATGC CCAAGTCTTC ACCAACCTGA AAC and fragments and 
derivatives thereof, said fragments and derivatives having sufficient size 
and nucleotide homology to bind a Norwalk or Norwalk-related virus 
5 genome. 

124. A method of detecting an immune response to Norwalk or a 
Norwalk related virus, comprising the steps of: 

collecting a serum sample from an individual suspected of 
having been exposed to Norwalk or a Norwalk related virus; 
10 selecting a protein encoded by the genomic sequence of a 

Norwalk-related virus or fragments or derivatives thereof, said 
fragments and derivatives having sufficient size and nucleotide 
homology to bind a Norwalk or Norwalk-related virus genome; 

adding said selected protein to said serum in a diagnostic 
15 assay under conditions allowing the selected protein and the serum 

to react; and 

measuring the amount of reaction of said serum and said 
selected protein. 

125. The method of claim 124, wherein said diagnostic assay is 
20 selected from the group consisting of enzyme-linked immunosorbent 

assays, radioimmunoassays and immunoblots. 

126. The method of claim 124, wherein said genomic sequence is 
the cDNA sequence of claim 117. 

127. The method of claim 124, wherein said genomic sequence is 
25 the cDNA sequence of claim 119. 

128. The method of claim 124, wherein said genomic sequence is 
the cDNA sequence of claim 120. 
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129. The method of claim 124, wherein said genomic sequence is 
the cDNA sequence of claim 121. 

130. The method of claim 124, wherein said genomic sequence is 
the cDNA sequence of claim 122. 

5 131. The method of claim 124, wherein said genomic sequence is 

the cDNA sequence of claim 123. 

132. A kit for detecting Norwalk viruses and Norwalk-related 
viruses, comprising: 

a container including at least one antiserum made from a 
10 protein encoded by genomic sequence of a Norwalk-related virus 

genome or from a fragment or derivative said genomic sequence, 
said fragments and derivatives having sufficient size and nucleotide 
homology to bind a Norwalk or Norwalk-related virus genome. 

133. The kit of claim 132, wherein said genomic sequence is the 
15 cDNA sequence of claim 117. 

134. The kit of claim 132, wherein said genomic sequence is the 
cDNA sequence of claim 119. 

135. The kit of claim 132, wherein said genomic sequence is the 
cDNA sequence of claim 120. 

20 136. The kit of claim 132, wherein said genomic sequence is the 

cDNA sequence of claim 121. 

137. The kit of claim 132, wherein said genomic sequence is the 
cDNA sequence of claim 122. 
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138. The kit of claim 132, wherein said genomic sequence is the 
cDNA sequence of claim 123. 

139. A chimeric protein, comprising; 

a protein encoded by a Norwalk virus genome combined with 
5 a protein encoded by a genome of a Norwalk-related virus. 

140. A method of detecting an immune response to Norwalk virus, 
comprising the steps of: 

collecting a serum sample from an individual suspected of 
having been exposed to Norwalk virus; 
10 adding said the chimeric protein of claim 139 to said serum 

in a diagnostic assay under conditions allowing chimeric protein 
and the serum to react; and 

measuring the amount of reaction of said serum and said 
chimeric protein. 

15 141. A vaccine for Norwalk or Norwalk related viruses, 

comprising 

the chimeric protein of claim 139 used as an antigen. 

142. A kit for detecting Norwalk or Norwalk-related related 
viruses, comprising: 

20 a container including at least one antiserum made from the 

chimeric protein of claim 139. 
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21 <1 
TGC TCT GGG AGC GGG CAT ACA GGT TGG TGG CGA CAG GCC CTC CAA 
cys ser gly ser gly his thr gly trp trp arp gin ala leu gin 

6i ei 

AGC CAA AGG TAT CAA CAA AAT TTG CAA CTG CAA GAA AAT TCT TTT 
ser gin arg tyr gin gin asn leu gin leu gin glu asn ser phe 

101 1S1 
AAA CAT GAC AGG GAA ATG ATT GGG TAT CAG GTT GAA GCT TCA AAT 
lys his asp arg glu net I le gly tyr gin val glu ala ser asn 

141 161 16 

CAA TTA TTG GCT AAA AAT TTG GCA ACT AGA TAT TCA CTC CTC CGT 
gin leu leu ala lys asn leu ala thr arg tyr ser leu leu arg 

i 201 221 

GCT GGG GGT TTG ACC AGT GCT GAT GCA GCA AGA TCT GTG GCA GGA 
ala gly gly t«u thr ser ala asp ala ala arg ser val ala gly 

241 261 
GCT CCA CTC ACC CGC ATT GTA GAT TGG AAT GGC GTG AGA GTG TCT 
ala pro val thr arg lie val asp trp asn gly val arg val ser 

281 301 

GCT CCC GAG TCC TCT GCT ACC ACA TTG AGA TCC GGT GGC TTC ATG 
ala pro glu ser ser ala thr thr leu arg ser gly gly phe net 

321 341 36 

TCA GTT CCC ATA CCA TTT GCC TCT AAG CAA AAA CAG GTT CAA TCA 
ser val pro lie pro phe ala ser lys gin lys gin val gin ser 

i 381 4 °1 

TCT GGT ATT AGT AAT CCA AAT TAT TCC CCT TCA TCC ATT TCT CGA 
ser gly I le ser asn pro asn tyr ser pro ser ser I le ser arg 

421 441 
ACC ACT AGT TGG GTC GAG TCA CAA AAC TCA TCG AGA TTT GGA AAT 
thr thr ser trp val glu ser gin asn ser ser arg phe gly asn 

461 481 
CTT TCT CCA TAC CAC GCG GAG GCT CTC AAT ACA GTG TGG TTG ACT 
leu ser pro tyr his ala glu ala leu asn thr val trp leu thr 

501 

CCA CCC GGT TCA ACC 
pro pro gly ser thr 
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ATGCACTTCA CAGGTGAATA GCATCAACCA 
ATGTACCTCA CAAGTGAACA GCATCAATCA 
CTGCACATCA CAGTGGAATT CCA-TGCCCA 
CTGCACCTCA CAGTGGAACT CCATTGCCCA 
TTGCACCTCA CAGTGGAACT CCATTGCCCT 
ATGTACTTCC CAGGTGAACA GCATAAATCA 
CTGCACTTCT CAAGTAAATA GCATAAATCA 
CTGCACTTCT CAAGTGAATA GCATAAATCA 
** ** ** ** ** ** * 

ACTCTATGTG CATTGTCAGA AGTCACTGGC 
ACCTTGTGGG GCCTATCAGA AGTTACTGGT 
ACACTCTGTG CACTATCTGA AGTCACAAAC 
ACTCTGTGTG CCCTTTCTGA AGTGACAGGA 
ACTCTGTGTG CCCTTTCTGA AGTGACAGGA 
ACTCTCTGTG CACTGTCTGA GGCCACTGGT 
ACCCTTTGTG CACTGTCTGA GGCTACTGGC 
ACCCTTTGTG CATTGTCTGA GGCTACTGGC 
****** ****** ** 

ATGTGATACA ATCACAATCT TATTTCTCAT 
ATGTAATACA GTCACAATCT TACTTTTCAT 
ACATCATACA AGCTAACTCC TTGTTCTCTT 
ACATCATACA AGCTAATTCC ATGTACTCTT 
ACATCATACA AGCTAATTCC ATGTACTCTT 
ATGTGGTGCA ATCCATGTCA TATTTCTCAT 
ATGTGGTGCA GTCCATGTCA TACTTCTCAT 
ATGTGGTACA GTCCATGTCA TACTTCTCAT 
* * + *+ * *+ * ** ★ 



CTGGATCCTA 


40 


SRSV/CDC 6/91 


CTGGATTTTG 


40 


SRSV/UT/88 


CTGGCTCCTC 


39 


SMA/78 


CTGGTTGCTT 


40 


SRSV/Cunbridge 


CTGGTTGCTT 


40 


SRSV/CDC 32 


CTGGATAATT 


40 


NV/8FHa/68 


CTGGATAATC 


40 


SRSV-3/88 


CTGGATAATC 


40 


SRSV/KY89/89 


**** + * 






TTGTCCCCTG 


80 


SRSV/CDC 6/91 


CTGGCTCCTG 


SO 


SRSV/UT/88 


CTGGCTCCTG 


79 


SMA/78 


CTAGGCCCCG 


SO 


SRSV/Cunbridge 


CTAGGCCCCG 


SO 


SRSV/CDC 32 


TTATCACCTG 


SO 


NV/SFHa/68 


TTATCACCTG 


so 


SRSV-3/88 


TTATCACCTG 
* ** * 


so 


SRSV/KY89/89 


TTTATGGT 


118 


SRSV/CDC 6/91 


TCTATGGT. 


118 


SRSV/UT/88 


TCTATGGT 


117 


SMA/78 


TCTATGGT 


118 


SRSV/Cunbridge 


TCTATGGT 


118 


SRSV/CDC 32 


TTTATGGT 


118 


NV/8FHa/68 


TTTACGGT 


118 


SRSV-3/88 


TCTACGGT 


118 


SRSV/KY89/89 



* * * ** * 
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Serum dilution (tog) 

FIG 16 
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Figure 17 
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